Page 525 - DSP Integrated Circuits
P. 525
510 Chapter 11 Processing Elements
The first Wj clock cycles are used to accumulate values from the ROM while
the last WRQM clock cycles are used to shift the result out of the shift registers.
Hence, the required number of clock cycles is
Notice that these two phases can be overlapped with subsequent operations so
that two operations are performed concurrently. In a typical filter implementation
Wd = 16 to 22 bits and WRQM = 4 to 16 bits. Hence, the number of clock cycles neces-
sary is Wj in most applications. The latency between the inputs and outputs is
WROM clock cycles, and a new computation can start every Wj clock cycles. The word
length of the result will be W^ + WRQM~ 1 bits. The result is split into two parts; the
least significant part comes from the output of the last full-adder in the accumulator
and the most significant part is formed as the bit-serial sum of the carry-register
and the sum-register. A special end-bit-slice is needed to form the desired output.
A local control unit can be integrated into the shift-accumulator. All local con-
trol signals can be generated from a single external synchronization signal that
initiates a new computation.
Each bit-slice is provided with a D flip-flop which forms a shift register gener-
ating delayed versions of the synchronization signal. The local control signal
needed for selection of the least and most significant parts of the output is gener-
ated using this shift register. The control is therefore independent of the word
length of the shift-accumulator. This simplifies the layout design and decreases
the probability of design errors. It also decreases the probability of timing prob-
lems that can occur when a signal is distributed over a large distance.
11.16 REDUCING THE MEMORY SIZE
The amount of memory required becomes very large for long inner products. There
are mainly two ways to reduce the memory requirements. The two methods can be
applied at the same time to obtain a very small amount of memory.
11.16.1 Memory Partitioning
One of several possible ways to
reduce the overall memory
requirement is to partition the
memory into smaller pieces
that are added before the
shift-accumulator as shown in
Figure 11.46. The amount of
memory is reduced from 2^
words to 2 • 2 N/2 words if the
original memory is partitioned
into two parts. For example,
10
for N = 10 we get 2 = 1024
5
words to 2 • 2 = 64 words.
Hence, this approach reduces
the memory significantly at Figure 11.46 Reducing the memory by partitioning
the cost of an additional adder.

