Page 413 - DSP Integrated Circuits
P. 413

398                                     Chapter 9 Synthesis of DSP Architectures


        multiplication can be implemented efficiently using distributed arithmetic, which
        will be discussed in detail in Chapter 11. The cost in terms of execution time and
        power consumption for such a vector multiplication is the same as for a scalar
        multiplication; the area though is somewhat larger. A vector multiplier with less
        than eight terms requires only slightly larger chip area than an ordinary multi-
        plier. The input and output values are bit-serial in distributed arithmetic. Hence,
        implementations based on vector-multipliers are generally highly efficient. Fur-
        ther, the design cost is low since the required circuits are highly regular, and
        modularity allows automatic layout.
            The direct form FIR filter shown in Figure 4.4 can be implemented directly by
        a single vector-multiplier that computes the output and a set of shift registers.
        However, the required chip area for a distributed arithmetic-based implementa-
        tion increases exponentially with the length of the filter and will become exces-
        sively large for N larger than about 12 to 14. In Chapter 11 we will discuss various
        techniques to reduce the area.
            The increase in chip area, compared to an ordinary multiplier, will generally
        be small for recursive digital filters since N is typically less than 5 to 8. We demon-
        strate by an example how vector-multipliers can be used to implement recursive
        digital filters.





        EXAMPLE 9.5
        Determine the number of vector-multipliers and a block diagram for the corre-
        sponding implementation of the bandpass filter discussed in Examples 4.4 and 4.5.
        Use pipelined second-order sections.
            The bandpass filter was implemented in cascade form with four second-order
        sections in direct form I. The numbers of scalar multipliers and adders for a classi-
        cal implementation are 20 and 16, respectively. The number of shift registers is 14.
            Now, observe that the output of each section is a vector-multiplication
        between constant coefficient vectors and data vectors of the form




        where





            Only four vector-multipliers are needed, one per second-order section. The
        cost in terms of PEs has been reduced significantly. The block diagram for the
        implementation is shown in Figure 9.13. Note that the vector-multipliers can work
        in parallel and that this approach results in a modular and regular hardware that
        can be implemented with a small amount of design work.
            The maximal sample frequency is proportional to the data word length.
        Clock frequencies of 400 MHz or more, can be achieved in a 0.8-um CMOS pro-
        cess. Hence, with a typical data word length of 20 bits, sample frequencies up to
        fcL/^d = 400/20 = 20 Msamples/s can be achieved.
   408   409   410   411   412   413   414   415   416   417   418