## Variable Block Adder (1A)

## Copyright (c) 2022-2010 Young W. Lim.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License".

Please send corrections (or suggestions) to youngwlim@hotmail.com.
This document was produced by using OpenOffice and Octave.

## Carry Skip Adder

$$
\mathrm{N}=\mathrm{R} \cdot \mathrm{k}
$$

R-2 groups

Ripple carry
Skip carry
Ripple carry


Any kill or generate condition results in divided (broken) critical paths
All FA's in R-2 groups must have the propagate condition
Variable Block Adder (1A)

## Carry Skip Adder

The maximal delay $\Delta$ of a Carry Skip Adder is encountered when carry is generated in the least-significant bit position,

- rippling through $k-1$ bit positions,
- skipping over $R-2=N / k-2$ groups in the middle,
- rippling to the $k-1$ bits of most significant group and
- being assimilated in the $N$-th bit position to produce the sum $S_{N}$ :

$$
\begin{aligned}
\Delta_{\mathrm{CSA}} & =(\mathrm{k}-1) \Delta_{\mathrm{rca}}+(\mathrm{R}-2) \Delta_{\mathrm{SKIP}}+(\mathrm{k}-1) \Delta_{\mathrm{rca}} \\
& =2(\mathrm{k}-1) \Delta_{\mathrm{rca}}+(\mathrm{R}-2) \Delta_{\mathrm{SKIP}} \\
& =2(\mathrm{k}-1) \Delta_{\mathrm{rca}}+(\mathrm{N} / \mathrm{k}-2) \Delta_{\mathrm{SKIP}}
\end{aligned}
$$

## Carry Skip Adder

$$
\begin{aligned}
\Delta_{\mathrm{CSA}} & =(\mathrm{k}-1) \Delta_{\mathrm{rca}}+(\mathrm{R}-2) \Delta_{\mathrm{SKIP}}+(\mathrm{k}-1) \Delta_{\mathrm{rca}} \\
& =2(\mathrm{k}-1) \Delta_{\mathrm{rca}}+(\mathrm{R}-2) \Delta_{\mathrm{SKIP}} \\
& =2(\mathrm{k}-1) \Delta_{\mathrm{rca}}+(\mathrm{N} / \mathrm{k}-2) \Delta_{\mathrm{SKIP}}
\end{aligned}
$$

Carry Skip Adder is faster than RCA at the expense of a few relatively simple modifications.

The delay is still linearly dependent on the size of the adder N ,

$$
N=R \cdot k
$$ however this linear dependence is reduced by a factor of $1 / k$



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

## Variable Block

however, unlike the carry select structure, the variable block adder must also worry about the delay from the Cin input through the block's ripple chain

Thus, after the carry chain passes the midpoint of the logic, the blocks begin decreasing in length.


This balances the path delays in the system and improves performance

The division of the overall structure into blocks depends on the details of the logic structure and the length of the entire computation


## Carry Skip Adder



$$
P=p_{i} \bullet p_{i+1} \bullet p_{i+2} \bullet \ldots \ldots \cdot p_{i+k-1}
$$

$\square-$


Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

Variable Block Adder
(1A)

## Carry Skip Adder



Delay d1 from A, B to P - parallel, the same delay in each group
Delay d 2 from A, B to C - serial, the accumulated delay from Isb
Delay d3 from A, B, Ci to Co - ripple carry delay


## Carry Skip Adder



Delay d3 from A, B, Ci to Co - ripple carry delay


## Carry Skip Adder



## Carry Skip Adder



## Variable Block

the organization of the blocks in the variable block carry structure bears some similarity to the carry select structure
the early stages of the structure grow in length, with short blocks for the low order bits, building in length further in the chain in order to equalize the arrival time of the carry from the block with that of the previous block

## Carry Skip Adder

to equalize the arrival time of the carry from the block with that of the previous block

## Carry Skip Adder



## Carry Skip Adder



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

## Carry Skip Adder



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

## Carry Skip Adder



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers
Variable Block Adder
17

## Carry Skip Adder

All carries propagated more quickly by varying the block sizes
Accordingly the initial blocks of the adder are made smaller, so as to quickly detect carry generates that must be propagated

The middle blocks are made larger because they are not the problem case,

And then the most significant blocks are made smaller so that the late arriving carry inputs can be processed quickly
https://en.wikipedia.org/wikik/Carry-skip_adder


## Carry Skip Adder

The longest path length through the carry skip block is potentially much shorter than the path from carry-in to carry-out through a ripple carry block.

However, the carry skip block has a slightly longer path from the least significant <g,t> input to carry output

Hence, this adder will only be faster when skipping groups makes up for the extra gate overhead accumulated by going from generate/transfer to carry-out

The maximum path length through a one block wide carry skip adder is the same as though a ripple carry adder, since the bottom block in a skip adder is a ripple carry


## Two separate ripple carry adders

Time
Cin signal is used to determine which adder's outputs should be used

if the Cin signal is true,
the output (carries) are selected from the true-Cin-adder
if the Cin signal is false, the output (carries) are selected from the false-Cin-adder
redundant hardware removes
Cin data dependency
first start redundant computation

later select the correct one

## Timing in broken carry chains

Time


These computations can take place before the completion of the previous columns, since they do not depend on the actual value of the Cin signal

## Time


the length of the adders and the breakpoint are carefully chosen such that the adders finish computations just as their Cin become available

## Carry Select Fast Carry Logic



## Variable Block


decreasing
length $\begin{array}{r}\text { Increasing } \\ \text { length }\end{array}$

$$
\mathrm{d} 1<\mathrm{d} 2
$$

short blocks for the low order bits
the delay from the Cin input through the block's ripple chain
https://en.wikipedia.org/wiki/Carry-lookahead_adder


## Variable Block

We use a block length from low order to high order cells of $2,2,4,5,7,5,4,2,1$ for a normal 32 bit structure
$8+17+7$
The first and last block in each adder is a simple ripple carry chain, while all other blocks use the variable block structure.

https://en.wikipedia.org/wiki/Carry-lookahead_adder

## Variable Block

Delay values of the variable block carry chain relative to other carry chains

The idea behind Variable Block Adder (VBA) is to minimize the longest critical path in the carry chain of a Carry Skip Adder, while allowing the groups to take different sizes.

Such an optimization in general does not result in an increased complexity as compared to the Carry Skip Adder

## Variable Block

The first and last blocks are smaller, and the intermediate blocks are larger.

That compensates for the critical paths originating from the ends by shortening the length of the path used for the carry signal to ripple in the end groups, allowing carry to skip over larger groups in the middle.

There are two important consequences of this optimization:
(a) the total delay is reduced as compared to a Carry Skip Adder
(b) the delay dependency is not a linear function of the adder size N as in a Carry Skip Adder.
This dependency follows a square root function of $N$ instead

## Variable Block

For an optimized VBA, it is possible to obtain a closed form solution expressing this delay dependency

It is also possible to extend this approach to multi-levels of carry skip as done in a determination of the optimal sizes of the blocks on the first and higher levels of skip blocks is a linear programming problem, which does not yield a closed form solution.

Such types of problems are solved with the use of dynamic programming techniques.

The speed of such a mult-level VBA adder surpasses single-level VBA and that of fixed group Carry-Lookahead Adder (CLA).

## Variable Block

For an optimized VBA, it is possible to obtain a closed form solution expressing this delay dependency which is given as:
where: c1 , c2 , c3 are constants.
$\Delta_{V B A}=c_{1}+\sqrt{c_{2} N+c_{3}}$

It is also possible to extend this approach to multiple levels of carry skip as done.

## Variable Block

(1) the speed of the logic gates used for CMOS implementation depends on the output load: fan-out, as well as the number of inputs: fan-in.
(2) CLA implementation is characterized with a large fan-in which limits the available size of the groups.

On the other hand VBA implementation is simple.

Thus, it seems that CLA has passed the point of diminishing returns as far as an efficient implementation is concerned.

## Variable Block

This example also points to the importance of modeling and incorporating appropriate technology parameters into the algorithm.

Most of the computer arithmetic algorithms developed in the past use a simple constant gate delay model.
(2.) a fixed-group CLA is not the best way to build an adder.

It is a sub-optimal structure which after being optimized for speed, consists of groups that are different in size yielding a largely irregular structure

## Variable Block

There are other advantages of VBA adder that are direct result of its simplicity and efficient optimization of the critical path.

Those advantages are exhibited in the lower area and power consumption while retaining its speed.

Thus, VBA has the lowest energy-delay product as compared to the other adders in its class.

## Delay model

Oklobdzija addition VLSI

On implementing addition in VLSI technology

Delay dependency : Fan-out, Fan-in, Delay estimates:

D_NAND $=1+0.3$ Fo +0.5 (Fi-2)
D_NOR $=1+0.5 \mathrm{Fo}+0.5(\mathrm{Fi}-2)$
D_NAND $=0.7+0.3$ Fo
$t$ denote the time required for a carry signal to ripple across a bit $T$ denote the time required for the signal to skip over a group of bits $m$ denotes the optimal number of groups for an n-bit carry chain $m$ is the smallest positive integer satisfying
$n \leq m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T$

## Delay model

$n$ : the number of bits in a carry skip adder
$m$ : the number of groups into which the bits are divided
$x_{1}, \ldots, x_{m}$ : the sizes of the groups beginning with the most significant bit
$T$ : the time required for a carry signal to skip over a group of bits
To be precise we should write $T=T(x)$ to indicate that
$T$ depends on the size $x$ of the group over which the carry is skipped
However, T changes very slowly with $x$ over the range of group sizes
So we assume that T is constant
For a given $n$, the following three step procedure gives
An optimal way of dividing an $n$ bit adder into groups of bits

## Variable Block



- total $n=32$ bits
- $m=9$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)$


## Carry Skip Adder


$t$ denote the time required for a carry signal to ripple across a bit
$T$ denote the time required for the signal to skip over a group of bits $m$ denotes the optimal number of groups for an n-bit carry chain

## Procedure

(I) Let $m$ be the smallest positive integer such that

$$
n \leq m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T
$$

(II) Let

$$
y_{i}=\min \{1+i T, 1+(m+1-i) T\}, \quad i=1, \ldots, m
$$

and construct a histogram whose $i$-th column has height $y_{i}$
for example, for $\mathrm{T}=3$, and $\mathrm{n}=48$, we have $\mathrm{m}=7$
(III) It is easily verified that the area of the histogram in (II) is

$$
m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T \geq n
$$

so these are at least $n$ unit squares in the histogram starting with the first row, shade in $n$ of the squares, row by row Let $x_{i}$ denote the number of shaded squares in column $i$ of the histogram, $i=1, \ldots, m$

Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

## Example 1-(1)

For a 48 bit adder we have, from Figure
$x_{1}=x_{7}=4, x_{2}=x_{6}=7, x_{3}=8, x_{4}=x_{5}=9$
The maximum delay is experienced by a signal generated in the second bit position and terminating in the $47^{\text {th }}$ bit position
the delay is $m T=21$

- total $n=48$ bits
- $m=7$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)=3$


## Example 1 - (2)

$$
\begin{aligned}
& n \leq m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T \\
& 48 \leq m+\frac{3}{2} m+\frac{3}{4} m^{2}+\left(1-(-1)^{m}\right) \frac{3}{8}
\end{aligned}
$$

- total $n=48$ bits
- $m=7$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)=3$

| 1 | 4 |
| ---: | ---: |
| 2 | 8 |
| 3 | 15 |
| 4 | 22 |
| 5 | 32 |
| 6 | 42 |
| 7 | 55 |
| 8 | 68 |
| 9 | 84 |
| 10 | 100 |



## Example 1 - (3)

| $n=48$ |
| :--- |
| $m=7$ |
| $T=3$ |

$y_{i}$

## Example 1 - (4)



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers


## Example 2-(1)

consider a 54 bit adder

From 2(i), we see that again $\mathrm{m}=7$.

If we shade 54 squares in Figure, we see that
$x_{1}=x_{7}=4, x_{2}=x_{6}=7, x_{3}=x_{5}=10, x_{4}=12$
Yields an optimal division of the adder.
Again the maximum delay is $\mathrm{mT}=21$

- total $n=54$ bits
- $m=7$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)=3$


## Example 2-(2)

$$
\begin{aligned}
& n \leq m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T \\
& 54 \leq m+\frac{3}{2} m+\frac{3}{4} m^{2}+\left(1-(-1)^{m}\right) \frac{3}{8}
\end{aligned}
$$

- total $n=54$ bits
- $m=7$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)=3$

| 1 | 4 |
| ---: | ---: |
| 2 | 8 |
| 3 | 15 |
| 4 | 22 |
| 5 | 32 |
| 6 | 42 |
| 7 | 55 |
| 8 | 68 |
| 9 | 84 |
| 10 | 100 |



## Example 2-(3)

$\substack{n=54 \\ m=7 \\ T=3}$

## Example 2-(4)



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers


## Example 3-(1)

## Consider a 64 bit adder

From 2(i) we compute m=8.
the corresponding histogram is shown in Figure

The optimal gorup sizes are:
$x_{1}=x_{8}=4, x_{2}=x_{7}=7, x_{3}=x_{6}=10, x_{4}=x_{5}=11$
The delay of the longest signal is $\mathrm{mT}=24$

- total $n=64$ bits
- $m=8$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)=3$


## Example 3 - (2)

$$
\begin{aligned}
& n \leq m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T \\
& 64 \leq m+\frac{3}{2} m+\frac{3}{4} m^{2}+\left(1-(-1)^{m}\right) \frac{3}{8}
\end{aligned}
$$

- total $n=64$ bits
- $m=7$ groups
- $i$-th group has $x_{i}$ bits (size)
- constant skip delay $T=T\left(x_{i}\right)=3$

| 1 | 4 |
| ---: | ---: |
| 2 | 8 |
| 3 | 15 |
| 4 | 22 |
| 5 | 32 |
| 6 | 42 |
| 7 | 55 |
| 8 | 68 |
| 9 | 84 |
| 10 | 100 |



## Example 3-(3)

$$
\begin{aligned}
& n=64 \\
& m=8 \\
& T=3 \\
& 7 \\
& y_{i}=\min \{1+i T, 1+(m+1-i) T\}, \quad i=1, \ldots, m \\
& y_{1}=\min \{1+1 \cdot T, 1+8 \cdot T\}=1+1 \cdot T=4 \\
& 0 \leq x_{1} \leq 1+1 \cdot T=4 \\
& y_{2}=\min \{1+2 \cdot T, 1+7 \cdot T\}=1+2 \cdot T=7 \\
& 0 \leq x_{2} \leq 1+2 \cdot T=7 \\
& y_{3}=\min \{1+3 \cdot T, 1+6 \cdot T\}=1+3 \cdot T=10 \\
& 0 \leq x_{3} \leq 1+3 \cdot T=10 \\
& y_{4}=\min \{1+4 \cdot T, 1+5 \cdot T\}=1+4 \cdot T=13 \\
& 0 \leq x_{4} \leq 1+4 \cdot T=13 \\
& y_{5}=\min \{1+5 \cdot T, 1+4 \cdot T\}=1+4 \cdot T=13 \\
& 0 \leq x_{5} \leq 1+4 \cdot T=13 \\
& y_{6}=\min \{1+6 \cdot T, 1+3 \cdot T\}=1+3 \cdot T=10 \\
& 0 \leq x_{6} \leq 1+3 \cdot T=10 \\
& y_{7}=\min \{1+7 \cdot T, 1+2 \cdot T\}=1+2 \cdot T=7 \\
& 0 \leq x_{7} \leq 1+2 \cdot T=7 \\
& y_{8}=\min \{1+8 \cdot T, 1+1 \cdot T\}=1+1 \cdot T=4 \\
& 0 \leq x_{8} \leq 1+1 \cdot T=4
\end{aligned}
$$

## Example 3 - (4)



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers


## Ripple delay and skip delay

For a 32-bit adder, and the VBA scheme, we divided the Carry chain into blocks of sizes 1, ,3, 5, 7, 7, 5, 3, 1.

Why this division is optimal?

Let $t$ denote the time required for a carry signal to ripple across a one bit in the carry chain, and let $T$ denote the time required for the signal to skip over a group of bits.

By simulation of the blocks, we have found that $t=0.8 \mathrm{~ns}$ and $T=165 \mathrm{~ns}$.
To simplify our analysis, we normalize them So that $t=1$ and $T=2$.

Then we apply the theory developed for finding the optimal division of a carry chain

$$
\begin{array}{cl}
\Delta_{\text {rca }}=1 & \text { ripple delay over a bit } \\
\Delta_{\text {SKIP }}=\mathrm{T} & \text { skip delay over a group }
\end{array}
$$

## Maximum propagation time P

Lemma 1 When the bits of a carry skip adder are grouped according to the scheme (i)-(iii), the maximum propagation time of a carry signal is $m T$

- $n$ bits
- m groups

The carry generated at the $2^{\text {nd }}$ bit position and terminating at the ( $n-1$ ) clearly has propagation time $m T$. We must show that any other signal has propagation time smaller than or equal to $m T$.

Consider a signal originating in the $i$-th group and terminating in the $j$-th, $\mathrm{i}<\mathrm{j}$.
Denote its propagation time by P. Clearly

$$
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T
$$

## Propagation delay P



## Propagation delay P

$$
\begin{gathered}
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T \\
x_{i} \leq \min \{1+i T, 1+(m+1-i) T\} \\
x_{j} \leq \min \{1+j T, 1+(m+1-j) T\}
\end{gathered}
$$

Assume a carry signal is generated in the $i$-th group and terminated in the $j$-th, $\mathrm{i}<\mathrm{j}$.
$P$ denotes propagation time

$$
P \leq \min \{1+i T, 1+(m+1-i) T\}+\min \{1+j T, 1+(m+1-j) T\}+(j-i-1) T-2
$$

## Delay model



Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

## Three cases

Case 1

$$
\begin{aligned}
& \min \{1+i T, 1+(m+1-i) T\}=1+i T \\
& \min \{1+j T, 1+(m+1-j) T\}=1+j T
\end{aligned}
$$



Case 2

$$
\begin{aligned}
& \min \{1+i T, 1+(m+1-i) T\}=1+i T \\
& \min \{1+j T, 1+(m+1-j) T\}=1+(m+1-j) T
\end{aligned}
$$

Case 3

$$
\min \{1+i T, 1+(m+1-i) T\}=1+(m+1-i) T
$$

$\square$


$$
\min \{1+j T, 1+(m+1-j) T\}=1+(m+1-j) T
$$

## Case 1

1. First, assume

$$
\begin{aligned}
& \quad \min \{1+i T, 1+(m+1-i) T\}=1+i T \\
& \quad \min \{1+j T, 1+(m+1-j) T\}=1+j T \\
& 1+j T \leq 1+(m+1-j) T \longrightarrow 2 j T \leq(m+1) T \Rightarrow 2 j T-T \leq m T \\
& P<1+i T+1+(m+1-i) T+(j-i-1) T-2=2 j T-T \leq m T \\
& \begin{aligned}
& x_{i} \leq \min \{1+i T, 1+(m+1-i) T\}=1+i T \\
& x_{j} \leq \min \{1+j T, 1+(m+1-j) T\}=1+j T \\
& P \leq \min \{1+i T, 1+(m+1-i) T\}+\min \{1+j T, 1+(m+1-j) T\}+(j-i-1) T-2 \\
&=1+i T+1+j T+(j-i-1) T-2 \\
&=2 j T-T \leq m T
\end{aligned}
\end{aligned}
$$

Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

## Case 2

2. Now, assume

$$
\begin{aligned}
& \min \{1+i T, 1+(m+1-i) T\}=1+i T \\
& \min \{1+j T, 1+(m+1-j) T\}=1+(m+1-j) T \\
& P \leq 1+i T+1+(m+1-i) T+(j-i-1) T-2=m T \\
& P \leq \min \{1+i T, 1+(m+1-i) T\}+\min \{1+j T, 1+(m+1-j) T\}+(j-i-1) T-2 \\
& P \leq 1+i T+1+(m+1-j) T+(j-i-1) T-2=m T
\end{aligned}
$$

## Case 3

3. Finally, assume

$$
\begin{aligned}
& \quad \min \{1+i T, 1+(m+1-i) T\}=1+(m+1-i) T \\
& \quad \min \{1+j T, 1+(m+1-j) T\}=1+(m+1-j) T \\
& P \leq 1+(m+1-i) T+1+(m+1-j) T+(j-i-1) T-2=2 m T-(2 i T-T) \leq 2 m T-m T=m T \\
& P \leq \min \{1+i T, 1+(m+1-i) T\}+\min \{1+j T, 1+(m+1-j) T\}+(j-i-1) T-2 \\
& P \leq 1+(m+1-i) T+1+(m+1-j) T+(j-i-1) T-2=2(m+1-i) T-T \\
& \leq 2(m+1-i) T-T=2 m T-(2 i T-T)=m T \\
& 1+i T \geq 1+(m+1-i) T \longrightarrow 2 i T \geq(m+1) T \longrightarrow \\
& \\
& \\
& \\
& \\
& \\
& \\
& \\
& \\
& \\
&
\end{aligned}
$$

## Maximum delay of a carry signal

Lemma 2 Let $D$ denote the maximum delay of a carry signal in a $n$ bit carry skip adder with group sizes chosen optimally. Then

- $n$ bits
- m groups
$(m-1) T \leq D \leq m T$

Since we have exhibited a division of the carry chain into groups In such a way that the maximum delay of a carry signal is $m T$ We clearly have $D \leq m T$

$(m-1) T$
$\square$

## Even number $\mathbf{r}=\mathbf{2 k}$ of groups (1)

Let $x_{1}, x_{2}, \ldots, x_{r}$ denote the optimal group sizes corresponding to $D$.

- $n$ bits

For the moment assume that $r=2 k$ is even.

- m groups

By considering carries originating in group $i$
and terminating in group $, r-i+1, i=1, \ldots, k$
we deduce the following system inequalities
the maximum delay of a carry signal is $m T$

$$
D \leq m T
$$

- $n$ bits
- r groups optimal

$$
\begin{aligned}
& 1, \quad 2 k \quad=r-(1-1), \quad i=1 \\
& 2, \quad 2 k-1=r-(2-1), \quad i=2 \\
& k, \quad k+1=r-(k-1), \quad i=k
\end{aligned}
$$

## Even number $\mathbf{r}=\mathbf{2 k}$ of groups (2)

$$
r=2 k
$$

$P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T$

$$
\begin{aligned}
& \left(x_{1}-1\right)+(r-2) T+\left(x_{r}-1\right) \leq D \\
& \left(x_{2}-1\right)+(r-4) T+\left(x_{r-1}-1\right) \leq D \\
& \left(x_{3}-1\right)+(r-6) T+\left(x_{r-2}-1\right) \leq D
\end{aligned}
$$

$$
\left(x_{k}-1\right)+(r-2 k) T+\left(x_{k+1}-1\right) \leq D
$$

$$
m T \quad \leq D
$$

- $n$ bits
- rgroups



## Even number $\mathbf{r}=\mathbf{2 k}$ of groups (3)

$$
r=2 k
$$

$$
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T
$$

- $n$ bits
- rgroups

| $i=1$ | $\cdots$ |  |  |  |  | W | $j=r$ |
| :---: | :---: | :---: | :---: | :---: | :---: | :---: | :---: |
| $i=2$ |  | m |  |  | m |  | $j=r-1$ |
| $i=3$ |  |  |  |  |  |  | $j=r-2$ |
| $i=k$ |  |  | m | W |  |  | $j=r-k+1$ |
|  |  |  |  |  |  | $\square$ |  |

$$
\begin{aligned}
& (j-i-1)=r-2=2(k-1) \\
& (j-i-1)=r-4=2(k-2) \\
& (j-i-1)=r-6=2(k-3) \\
& (j-i-1)=r-2 k=2(k-k)
\end{aligned}
$$

## Even number $\mathbf{r}=\mathbf{2 k}$ of groups (4)

$$
r=2 k
$$

$$
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T
$$

- $n$ bits
- rgroups

$$
\begin{aligned}
& \left(x_{1}-1\right)+2(k-1) T+\left(x_{2 k}-1\right) \leq D \quad i=1 \\
& \left(x_{2}-1\right)+2(k-2) T+\left(x_{2 k-1}-1\right) \leq D \quad i=2 \\
& \left(x_{3}-1\right)+2(k-3) T+\left(x_{2 k-2}-1\right) \leq D \quad i=3
\end{aligned}
$$

$$
\sum_{i=1}^{r} x_{i}=n
$$

$$
\sum_{i=1}^{k} i=\frac{k(k+1)}{2}
$$

$$
\left(x_{k}-1\right)+2(k-k) T+\left(x_{k+1}-1\right) \leq D \quad i=k
$$

$$
2 k=r
$$

$$
r T \quad \leq D
$$

## Even number $\mathbf{r}=\mathbf{2 k}$ of groups (5)

$$
r=2 k
$$

$$
n-2 k+(k+1) r T-k(k+1) T \leq(k+1) D
$$

$$
\frac{n-2 k}{k+1}+r T-k T \leq D
$$

$$
\frac{n-2 k}{k+1}+2 k T-k T \leq D
$$

$$
\frac{n-2(k+1)+2}{k+1}+k T \leq D
$$

$$
\frac{n+2}{k+1}+(k+1) T-(T+2) \leq D
$$

$$
2 \sqrt{(n+2) T}-(T+2) \leq D
$$

$$
\sqrt{4 n T+8 T}-(T+2) \leq D
$$

## Odd number $\mathbf{r}=\mathbf{2 k + 1}$ of groups (1)

Let $x_{1}, x_{2}, \ldots, x_{r}$ denote the optimal group sizes corresponding to $D$.

- $n$ bits

For the moment assume that $r=2 k+1$ is odd.

- m groups

By considering carries originating in group $i$
and terminating in group $, r-i+1, i=1, \ldots, k$
we deduce the following system inequalities
the maximum delay of a carry signal is $m T$

$$
D \leq m T
$$

- $n$ bits
- r groups optimal

$$
\begin{aligned}
& 1, \quad 2 k+1=r-(1-1), \quad i=1 \\
& 2, \quad 2 k \quad=r-(2-1), \quad i=2 \\
& k, \quad k+2=r-(k-1), \quad i=k
\end{aligned}
$$

## Odd number $\mathbf{r}=\mathbf{2 k + 1}$ of groups (2)

$$
r=2 k+1
$$

$$
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T
$$

$$
\left(x_{1}-1\right)+(r-2) T+\left(x_{r}-1\right) \leq D
$$

$$
\left(x_{2}-1\right)+(r-4) T+\left(x_{r-1}-1\right) \leq D
$$

$$
\left(x_{3}-1\right)+(r-6) T+\left(x_{r-2}-1\right) \leq D
$$

$$
\left(x_{k}-1\right)+(r-2 k) T+\left(x_{r-k+1}-1\right) \leq D
$$

$$
k T+\left(x_{k+1}-1\right) \leq D
$$

- $n$ bits
- rgroups



## Odd number $\mathbf{r}=\mathbf{2 k + 1}$ of groups (3)

$$
r=2 k+1
$$

$$
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T
$$

- $n$ bits
- rgroups



## Odd number $\mathbf{r}=\mathbf{2 k + 1}$ of groups (4)

$$
r=2 k+1
$$

$$
P \leq\left(x_{i}-1\right)+(j-i-1) T+\left(x_{j}-1\right) \leq m T
$$

$$
\begin{aligned}
& \left(x_{1}-1\right)+2(k-1) T+T+\left(x_{2 k+1}-1\right) \leq D \\
& \left(x_{2}-1\right)+2(k-2) T+T+\left(x_{2 k}-1\right) \leq D \\
& \left(x_{3}-1\right)+2(k-3) T+T+\left(x_{2 k-1}-1\right) \leq D
\end{aligned}
$$

$$
\left(x_{k}-1\right)+2(k-k) T+T+\left(x_{k+2}-1\right) \leq D
$$

$$
k T+\left(x_{k+1}-1\right) \leq D
$$

$$
n-2 k-1+(r+1) k T-k(k+1) T \leq(k+1) D
$$

- $n$ bits
- rgroups

$$
\begin{aligned}
& \sum_{i=1}^{r} x_{i}=n \\
& \sum_{i=1}^{k} i=\frac{k(k+1)}{2} \\
& 2 k+1=r
\end{aligned}
$$

## Odd number $\mathbf{r}=\mathbf{2 k + 1}$ of groups (5)

$$
\begin{aligned}
r= & 2 k+1 \\
& n-2 k-1+(r+1) k T-k(k+1) T \leq(k+1) D \\
& \frac{n-2 k-1}{(k+1)}+\frac{(r+1) k T}{(k+1)}-k T \leq D \\
& \frac{n-2 k-1}{(k+1)}+\frac{(2(k+1)) k T}{(k+1)}-k T \leq D \\
& \frac{n-2 k-1}{(k+1)}+k T \leq D \\
& \frac{n-2(k+1)+1}{(k+1)}+(k+1) T-T \leq D \\
& \frac{(n+1)}{(k+1)}+(k+1) T-(T+2) \leq D \\
& 2 \cdot \sqrt{(n+1) T}-(T+2) \leq D \\
& \sqrt{4 n T+4 T}-(T+2) \leq D
\end{aligned}
$$

$$
\begin{aligned}
& \text { arith mean } \geq \text { geo mean } \\
& \frac{(n+1)}{(k+1)}+(k+1) T \geq 2 \cdot \sqrt{(n+1) T} \\
& \text { min when } \quad \frac{(n+1)}{(k+1)}=(k+1) T \\
& \\
& \\
& \\
& \\
& \\
& (k+1)=\sqrt{\frac{n+1}{T}}=(k+1)^{2}
\end{aligned}
$$

## Delay model

$$
\begin{array}{ll}
r=2 k & \sqrt{4 n T+8 T}-(T+2) \leq D \\
r=2 k+1 & \sqrt{4 n T+4 T}-(T+2) \leq D \\
& \sqrt{4 n T+4 T}-(T+2) \leq \sqrt{4 n T+8 T}-(T+2) \\
& \sqrt{4 n T+4 T}-(T+2) \leq D
\end{array}
$$

We will not produce an upper bound on mT .
Since $m$ is the smallest positive integer satisfying

$$
\begin{aligned}
& (m-1)+\frac{1}{2}(m-1) T+\frac{1}{4}(m-1)^{2} T+\left(1-(-1)^{m-1}\right) \frac{T}{8}<n \\
& n \leq m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T
\end{aligned}
$$

## Delay model

$$
\begin{aligned}
& (m-1)+\frac{1}{2}(m-1) T+\frac{1}{4}(m-1)^{2} T+\left(1-(-1)^{m-1}\right) \frac{T}{8}<n \\
& (m-1)+\frac{1}{2}(m-1) T+\frac{1}{4}(m-1)^{2} T+\left(1-(-1)^{m-1}\right) \frac{T}{8}+1 \leq n \\
& (m)+\frac{1}{2}(m T-T)+\frac{1}{4}\left(m^{2} T-2 m T+T\right)+\left(1-(-1)^{m-1}\right) \frac{T}{8} \leq n \\
& m-\frac{1}{4} T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m-1}\right) \frac{T}{8} \leq n \\
& \frac{1}{4} m^{2} T \leq n-m+\frac{1}{4} T-\left(1-(-1)^{m-1}\right) \frac{T}{8} \\
& \left(\frac{1}{4} m^{2} T\right) \cdot 4 T \leq\left(n-m+\frac{1}{4} T-\left(1-(-1)^{m-1}\right) \frac{T}{8}\right) \cdot 4 T \\
& (m T+2)^{2} \leq 4+4 n T+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2} \\
& m^{2} T^{2} \leq 4 n T-4 m T+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2} \\
& m^{2} T^{2}+4 m T \leq 4 n T+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2} \\
& \left(m m T+4 \leq 4+4 n T+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}\right. \\
& (m T+2) \leq \sqrt{4+4 n T+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}
\end{aligned}
$$

## Delay model

$$
\begin{aligned}
& r=2 k \\
& r=2 k+1 \quad \sqrt{4 n T+8 T}-(T+2) \leq D \\
& \sqrt{4 n T+4 T}-(T+2) \leq D \\
& m T \leq-2+\sqrt{4+4 n T+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}} \\
& m T-D \leq T+\frac{T^{2}-8 T+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}{\sqrt{4 n T+8 T}+\sqrt{4 n T+4+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}}
\end{aligned} \text { even r} \text { } \begin{aligned}
& m T-D \leq T+\frac{T^{2}-4 T+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}{\sqrt{4 n T+4 T}+\sqrt{4 n T+4+T^{2}-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}} \text { odd r}
\end{aligned}
$$

## Delay model

$$
\begin{array}{r}
m T \leq-2+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}} \\
r=2 k \quad \sqrt{4 n T+8 T}-(T+2) \leq D \quad m T-D \leq T+\frac{T^{2}-8 T+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}{\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}} \\
r=2 k+1 \quad \sqrt{4 n T+4 T}-(T+2) \leq D \\
m T-D \leq T+\frac{T^{2}-4 T+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}{\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}}
\end{array}
$$

## $r=2 k$ even $r$

$$
r=2 k
$$

$$
\begin{gathered}
m T \leq-2+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}} \\
-D \leq-\sqrt{4 n T+8 T}+(T+2) \\
m T-D \leq T-\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}} \\
m T-D \leq T+\frac{T^{2}-8 T+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}{\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}}
\end{gathered}
$$

## $r=2 k$ even $r$

$$
\begin{aligned}
& r=2 k \\
& X \stackrel{\text { wt }}{\underline{=}} 4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}
\end{aligned}
$$

$$
\begin{aligned}
& m T \leq-2+\sqrt{4 n T+T^{2}+X} \\
& -D \leq-\sqrt{4 n T+8 T}+(T+2)
\end{aligned}
$$

$$
\begin{aligned}
& m T-D \leq T-\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X} \\
& (-\sqrt{a}+\sqrt{b}) \cdot \frac{(+\sqrt{a}+\sqrt{b})}{(+\sqrt{a}+\sqrt{b})}=\frac{(-a+b)}{(+\sqrt{a}+\sqrt{b})} \\
& \left(-\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}\right) \cdot \frac{\left(\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}\right)}{\left(\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}\right)} \\
& \left\{-\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}\right\} \cdot\left\{+\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}\right\} \\
& \quad=-(4 n T+8 T)+\left(4 n T+T^{2}+X\right)=T^{2}-8 T+X
\end{aligned}
$$

$$
m T-D \leq T+\frac{T^{2}-8 T+X}{\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}}
$$

## $r=2 k+1$ odd $r$

$$
\begin{aligned}
& r=2 k+1 \\
& \quad \sqrt{4 n T+4 T}-(T+2) \leq D \\
& \quad-\sqrt{4 n T+4 T}+(T+2) \geq-D
\end{aligned}
$$

$$
m T \leq-2+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}
$$

$$
-D \leq-\sqrt{4 n T+4 T}+(T+2)
$$

$$
m T-D \leq T-\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}
$$

$$
m T-D \leq T+\frac{T^{2}-4 T+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}{\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}}}
$$

## $r=2 k+1$ odd $r$

$$
\begin{aligned}
& r=2 k+1 \\
& X \xlongequal{\text { wat }} 4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}
\end{aligned}
$$

$$
\begin{aligned}
& m T \leq-2+\sqrt{4 n T+T^{2}+X} \\
& -D \leq-\sqrt{4 n T+4 T}+(T+2)
\end{aligned}
$$

$$
\begin{aligned}
& m T-D \leq T-\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X} \\
& (-\sqrt{a}+\sqrt{b}) \cdot \frac{(+\sqrt{a}+\sqrt{b})}{(+\sqrt{a}+\sqrt{b})}=\frac{(-a+b)}{(+\sqrt{a}+\sqrt{b})} \\
& \left(-\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}\right) \cdot \frac{\left(\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}\right)}{\left(\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}\right)} \\
& \left\{-\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}\right\} \cdot\left\{+\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}\right\} \\
& =-(4 n T+4 T)+\left(4 n T+T^{2}+X\right)=T^{2}-4 T+X
\end{aligned}
$$

$$
m T-D \leq T+\frac{T^{2}-4 T+X}{\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}}
$$

## $r=2 k$ even $r$

$X \xlongequal{\text { def }} 4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}$

- odd $m \quad X=4$
- even $m \quad X=4-T^{2}$

$$
\sqrt{4 n T+T^{2}+4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}} \quad=\sqrt{4 n T+T^{2}+X}
$$

- even $m$
- odd $m$

$$
\begin{aligned}
& (-1)^{m-1}=1 \\
& \left(1-(-1)^{m-1}\right)=0
\end{aligned}
$$

$$
\sqrt{4 n T+4+T^{2}}
$$

$$
\begin{aligned}
& (-1)^{m-1}=-1 \\
& \left(1-(-1)^{m-1}\right)=2 \\
& \sqrt{4 n T+T^{2}+4-T^{2}} \\
& =\sqrt{4 n T+4}
\end{aligned}
$$

$$
m T-D \leq T+\frac{T^{2}-8 T+X}{\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}}
$$

$$
\leq T+\frac{T^{2}-8 T+4}{\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+4}}
$$

$$
\leq T+\frac{-8 T+4}{\sqrt{4 n T+8 T}+\sqrt{4 n T+4}}
$$

$$
m T-D \leq T+\frac{T^{2}-4 T+X}{\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}}
$$

$$
\leq T+\frac{T^{2}-4 T+4}{\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+4}}
$$

$$
\leq T+\frac{-4 T+4}{\sqrt{4 n T+4 T}+\sqrt{4 n T+4}}
$$

## $r=2 k$ even $r$

$$
\begin{aligned}
& m T-D \leq T+\frac{T^{2}-8 T+X}{\sqrt{4 n T+8 T}+\sqrt{4 n T+T^{2}+X}} \\
& m T-D \leq T+\frac{T^{2}-4 T+X}{\sqrt{4 n T+4 T}+\sqrt{4 n T+T^{2}+X}}
\end{aligned}
$$

$$
X \stackrel{\text { def }}{=} 4-\left(1-(-1)^{m-1}\right) \frac{T^{2}}{2}
$$

For $n$ sufficiently large, we have $m T-\Delta<T+1$ And since $m T-\Delta$ is an integer, $m T-\Delta \leq T$ This completes the proof of the lemma

## Delay model

The scheme 2(i)- 2(iii) given above
for dividing the bits of a carry skip adder
Into groups is optimal for $2 \leq T \leq 7$
Assume the scheme is not optimal and
let $D$ be the maximum delay
Corresponding to an optimal division of the bits into groups
Assume there are $r$ groups in the optimal division.
Since a carry in signal to the least significant bit group can skip over
Each group we have $r T \leq D \leq m T$, so $r \leq m$
If $r=m$ then $D=m T$ and the theorem holds by Lemma 1

## Delay model

If $r=m-1 \mathrm{~m}$ and $r$ have different parities and it follows from (5) that $m T-D<T \quad$ for $2 \leq T \leq 7$
so that $D>(m-1) T=r T$

This means that a signal which skips over each of the $r$ groups has delay less than the maximum \%DELTA

Similarly, if $r<m-1, \quad D>(m-1) T>r T$ So that a signal which skips over each group has delay < D

## Delay model

It follows that a signal with delay \%DELTA must start in a group I,

Ripple to the end of this group, then skip over s < r groups and either terminate, Or ripple through the first few bits of a group $\mathrm{j}>\mathrm{I}$.

Let x _i and x _ denote the lengths of the ith and jth groups respectively.

## Delay model

Assume that I is chosen as small as possible and j as large as possible.

A signal originating in group I, rippling to the end of this group and then skipping over the next s group has delay \%DELTA <= (x_i-1) + sT <= (x_i-1) + (r-1)T <= (x_i 1-1) + (m-2)T.

## Delay model

Since \%DELTA >= (m-1)T this implies that x _ $\mathrm{i}>=\mathrm{T}=1$
Divide group I into two groups such that the group containing the most significant Bits has size T .
Since the i-th group is the first group in which a signal having maximum delay can Originate, this subdivision does not increase the delay of any carry signal of maximum delay
However, it increases the number of groups by 1
Suppose now that a carry signal originates in a group I, ripples to its end, Skips over $\mathrm{s}<=\mathrm{r}$-2 groups and finally ripples through the first few bits of a group $J$ and terminates. We then have
\%DELTA <= (x_i-1) + sT + (x j-1) <= x_i + x_j-2 + (m - 3)T
pp

## Delay model

So that either x _i $>=\mathrm{T}+1$ or $\mathrm{x}-\mathrm{j}>=\mathrm{T}+1$.
This means that we can subdivide one of the groups I,j without increasing \%DELTA
Continuing in this way, we can always increase the number r of group In an optimal division of a carry chain by 1 without increasing \%DELTA If $r<m$
This means that we can arrive at an optimal division of the carry chain Into m groups.
We must then have \%DELTA >= mT which, together with Lemma 2, Implies \%DELTA = mT
This completes the proof of the theorem

## Delay model



$$
\begin{aligned}
y_{i}= & \min \{1+i T, 1+(m+1-i) T\} \\
& y_{1,}, \ldots, y_{m} \\
0 \leq & x_{i} \leq y_{i}, i=1, \ldots, m
\end{aligned}
$$

Oklobdzija: High-Speed VLSI arithmetic units : adders and multipliers

$$
\sum_{i=1}^{m} x_{i}=n
$$

## Delay model



## Delay model

Given $m$, an optimal division of the carry chain into groups
Can be obtained as follows
Let

$$
y_{i}=\min \{1+i T, 1+(m+1-i) T\}
$$

Given $y_{1, \ldots, y_{m}}$ solve the minimization problem

$$
\min \max \left\{x_{1}, \ldots, x_{n}\right\}
$$

Subject to

$$
0 \leq x_{i} \leq y_{i}, i=1, \ldots, m
$$

And

$$
\sum_{i=1}^{m} x_{i}=n
$$

Any solution $x_{1,}, \ldots, x_{m}$ gives optimal group sizes for a division of the carry chain

## Delay model

The x's can be computed iteratively as follows:

Initially take $x_{1}=x_{m}=0$

At each iteration, increase as many of the x's as possible by one unit, without violating the constraints

$$
0 \leq x_{i} \leq y_{i}, i=1, \ldots, m \quad \sum_{i=1}^{m} x_{i} \leq n
$$

An easy calculation shows that

$$
\sum_{i=1}^{m} y_{i}=m+\frac{1}{2} m T+\frac{1}{4} m^{2} T+\left(1-(-1)^{m}\right) \frac{1}{8} T \geq n
$$

Thus, at some iteration, we have $\sum_{i=1}^{m} x_{i}=n$ and
The algorithm terminates

## Delay model

For $n=32$, we have $m=7$, $(y 1, y 2, y 3, y 4, y 5, y 6, y 7)=(3,5,7,9,7,5,3)$
The above algorithm gives $(x 1, x 2, x 3, x 4, x 5, x 6, x 7)=(3,5,5,6,5,5,3)$
A carry chain divided in this way has maximum delay $\mathrm{D}=\mathrm{mT}=14$
Since one unit of delay is 0.8 ns, the maximum delay for 32 -bit carry chain is $D=14^{*} 0.8 \mathrm{~ns}=11.2 \mathrm{~ns}$
This time involves only the delay in the carry chain
It is easy to check that this is also the delay for a chain divided into groups of sizes $1,3,5,7,7,5,3,1$.
Thus this is also an optimal subdivision
The worst case delay includes the time needed to generate $p_{i}$ and $g_{i}$ signals
Delay of the carry chain, and the time for producing last sum bit $\mathrm{s}_{\mathrm{n}}$

## Delay model

Implement it with a string of multiplexers

The multiplexer cell is designed as very fast

Multiplexers are designed
as very fast structures using buffered pass gates and in this sense are similar to the Manchester carry chain which has been shown to be the most effective implementation of a carry chain

## Delay model

The implementation of a single carry block is done by mixing a 4 to 1 multiplexer (actually used as a 3 to 1 )

In the last stage with a string of 2 to 1 multiplexers
a carry bypass is connected to inputs 3 and 4 of the $4: 1$ multiplexer (group carry multiplexer) and the selection of the carry bypass is activated by the NAND gate singaling when the condition for group propagate is reached and activating the group multiplexer in turn.

## Delay model

The32-bit implementation of the VBA adder is obtained
By connecting the groups of the sizes calculated
For the full length of $n=32$ bits

To increase the speed further we used a faster inverting version Of the multiplexer, alternating between Ci and Cb _i signals

