Page 512 - DSP Integrated Circuits

P. 512

11.10 Bit-Serial Squarers 497

In the next step, /g = f(x-x\) is decomposed in the same manner into the
square of the most significant bit of x-xi with rest term and a remaining square/^.
The scheme is repeated as long as there are bits left to process in the remaining
square— i.e., until the square f n is reached.
Examining this scheme we find that in order to input a bit-serial word x with
the least significant bit first, we have to reverse the order of the iterations in the
preceding scheme to have a suitable algorithm.
The iterative algorithm then can be written as

where

In each iteration j we accumulate the previous term/y +i and input the next bit
Xj. If Xj = I then we add the square of the bit weight and the weights of the bits
that have arrived prior to bit Xj shifted left 1-j positions. Then we store bit Xj for
the next iteration. Examination of the bit weights accumulated in each iteration
reveals that the sum converges toward the correct square with at least one bit in
each step, going from the least significant bit in the result toward more significant
bits.
An implementation of the preceding algorithm is shown in Figure 11.35. It
uses a shift-accumulator to shift the accumulated sum to the right after each iter-
ation. Thus, left-shifting of the stored xfs are avoided and the addition of the
squared bit weight of the incoming Xj is reduced to a shift to the left in each itera-
tion. The implementation consists of n regular bit-slices which makes it suitable
for hardware implementation.
The operation of the squarer in Figure 11.35 is as follows: All D flip-flops and
SR flip-flops are assumed to be cleared before the first iteration. In the first itera-
tion, the control signal c(0) is high while the remaining control signals are low. This
allows the first bit #(0) = x n to pass the AND gate on top of the rightmost bit-slice.
The value x n is then stored in the SR flip-flop of the same bit-slice for later use. Also,
x n is added to the accumulated sum via the OR gate in the same bit-slice, 3

3
- An OR gate is sufficient to perform the addition.

507 508 509 510 511 512 513 514 515 516 517