Page 525 - DSP Integrated Circuits
P. 525

510                                            Chapter 11 Processing Elements


            The first Wj clock cycles are used to accumulate values from the ROM while
        the last WRQM clock cycles are used to shift the result out of the shift registers.
        Hence, the required number of clock cycles is




            Notice that these two phases can be overlapped with subsequent operations so
        that two operations are performed concurrently. In a typical filter implementation
        Wd = 16 to 22 bits and WRQM = 4 to 16 bits. Hence, the number of clock cycles neces-
        sary is Wj in most applications. The latency between the inputs and outputs is
        WROM clock cycles, and a new computation can start every Wj clock cycles. The word
        length of the result will be W^ + WRQM~ 1 bits. The result is split into two parts; the
        least significant part comes from the output of the last full-adder in the accumulator
        and the most significant part is formed as the bit-serial sum of the carry-register
        and the sum-register. A special end-bit-slice is needed to form the desired output.
            A local control unit can be integrated into the shift-accumulator. All local con-
        trol signals can be generated from a single external synchronization signal that
        initiates a new computation.
            Each bit-slice is provided with a D flip-flop which forms a shift register gener-
        ating delayed versions of the synchronization signal. The local control signal
        needed for selection of the least and most significant parts of the output is gener-
        ated using this shift register. The control is therefore independent of the word
        length of the shift-accumulator. This simplifies the layout design and decreases
        the probability of design errors. It also decreases the probability of timing prob-
        lems that can occur when a signal is distributed over a large distance.


        11.16 REDUCING THE MEMORY SIZE

        The amount of memory required becomes very large for long inner products. There
        are mainly two ways to reduce the memory requirements. The two methods can be
        applied at the same time to obtain a very small amount of memory.

        11.16.1 Memory Partitioning

        One of several possible ways to
        reduce the overall memory
        requirement is to partition the
        memory into smaller pieces
        that are added before the
        shift-accumulator as shown in
        Figure 11.46. The amount of
        memory is reduced from 2^
        words to 2 • 2 N/2  words if the
        original memory is partitioned
        into two parts. For example,
                          10
        for N = 10 we get 2  = 1024
                      5
        words to 2 • 2  = 64 words.
        Hence, this approach reduces
        the memory significantly at   Figure 11.46 Reducing the memory by partitioning
        the cost of an additional adder.
   520   521   522   523   524   525   526   527   528   529   530