Page 413 - DSP Integrated Circuits
P. 413
398 Chapter 9 Synthesis of DSP Architectures
multiplication can be implemented efficiently using distributed arithmetic, which
will be discussed in detail in Chapter 11. The cost in terms of execution time and
power consumption for such a vector multiplication is the same as for a scalar
multiplication; the area though is somewhat larger. A vector multiplier with less
than eight terms requires only slightly larger chip area than an ordinary multi-
plier. The input and output values are bit-serial in distributed arithmetic. Hence,
implementations based on vector-multipliers are generally highly efficient. Fur-
ther, the design cost is low since the required circuits are highly regular, and
modularity allows automatic layout.
The direct form FIR filter shown in Figure 4.4 can be implemented directly by
a single vector-multiplier that computes the output and a set of shift registers.
However, the required chip area for a distributed arithmetic-based implementa-
tion increases exponentially with the length of the filter and will become exces-
sively large for N larger than about 12 to 14. In Chapter 11 we will discuss various
techniques to reduce the area.
The increase in chip area, compared to an ordinary multiplier, will generally
be small for recursive digital filters since N is typically less than 5 to 8. We demon-
strate by an example how vector-multipliers can be used to implement recursive
digital filters.
EXAMPLE 9.5
Determine the number of vector-multipliers and a block diagram for the corre-
sponding implementation of the bandpass filter discussed in Examples 4.4 and 4.5.
Use pipelined second-order sections.
The bandpass filter was implemented in cascade form with four second-order
sections in direct form I. The numbers of scalar multipliers and adders for a classi-
cal implementation are 20 and 16, respectively. The number of shift registers is 14.
Now, observe that the output of each section is a vector-multiplication
between constant coefficient vectors and data vectors of the form
where
Only four vector-multipliers are needed, one per second-order section. The
cost in terms of PEs has been reduced significantly. The block diagram for the
implementation is shown in Figure 9.13. Note that the vector-multipliers can work
in parallel and that this approach results in a modular and regular hardware that
can be implemented with a small amount of design work.
The maximal sample frequency is proportional to the data word length.
Clock frequencies of 400 MHz or more, can be achieved in a 0.8-um CMOS pro-
cess. Hence, with a typical data word length of 20 bits, sample frequencies up to
fcL/^d = 400/20 = 20 Msamples/s can be achieved.