Page 501 - DSP Integrated Circuits
P. 501

486                                            Chapter 11 Processing Elements

        11.6.5 S/P Multiplier-Accumulator

        An alternative realization of the serial/parallel SP multiplier with an additional input
        that allows computations of the typey = a-x +z is shown in Figure 11.18. The extra
        input allows a value z to be added at the same level of significance as x. A multiplier-
        accumulator is obtained if the output y is truncated or rounded to the same word
        length as x and added to the subsequent multiplication. A full precision multiplier-
        accumulator is obtained if the part of y that is truncated is saved and used to set the
        sum D flip-flops instead of resetting them at the start of a multiplication.





















              Figure 11.18 Serial/parallel multiplier with an inherent input for addition


        11.7 BIT-SERIAL TWO-PORT ADAPTOR

        The PEs must be able to handle overflow and quantization of data. In order to
        detect overflow, extra guard bits must be introduced by extending the sign bit to
        the left. If the output from a PE is correct, the sign bit and the guard bits are
        equal, otherwise they differ. Hence, overflow detection is simple. However, due to
        the bit-serial processing, the correction of overflow must be delayed until the
        result is available. This may be a problem in recursive loops.
            Quantization of data must also be performed in the recursive loops. In the
        case of truncation, quantization is simple and does not incur any extra clock
        cycles. The penalty for using more complicated quantization schemes may be addi-
        tional clock cycles. In nonrecursive algorithms, such as the FFT and the DCT,
        extra pipeline stages can be used for overflow and quantization. Thus, the
        throughput will not be decreased.
            Bit-serial adders and multipliers implement adaptors in a wave digital filter
        efficiently. A block diagram for the bit-serial implementation of a symmetric two-
        port adaptor based on a serial/parallel multiplier is shown in Figure 11.19. For
        simplicity the word length of the adaptor coefficient is selected to be only 5 bits.
        Pipelining has been applied such that the longest path is a full-adder and a few
        gates. Execution time for the adaptor is

                                 T PE = (W c + W d + 2)T CL
        and latency is (W c -1 + 2)TcL where TCL is the clock period.
   496   497   498   499   500   501   502   503   504   505   506