Page 538 - DSP Integrated Circuits
P. 538

References                                                           523

        structure, the pixels that are multiplied by the same coefficient are added (or sub-
        tracted). This reduces the number of terms in the remaining inner products by
                                                  N
                                                           /2
        50%. The chip area is thereby reduced from O(2 ) to O(2^ ), which is a significant
        reduction. In comparison, the area for the bit-serial adders is insignificant. Figure
        11.58 shows a block diagram for the DCT PE.
            A 2-D DCT for 16 x 16 pixels can be built using only one 1-D DCT PE which
        itself consists of 16 distributed arithmetic units with N = 8. The TSPC-based
        shift-accumulator in Figure 11.51 can be used to implement a distributed arith-
        metic unit. The length of the shift-accumulator depends on the word length,
        WROM, which depends on the coefficients in the vector-products. In this case we
        assume that WRQM = W c +1 = 12 bits. The ROM corresponding to each bit-slice is
                                          3
        organized to have eight rows and 2^~  columns in order to have about the same
        width as the bit-slices.
            The area for a 1-D DCT PE is estimated to
                                                          2
                   A DCT « 16 A DA +A Wire » 16 • 0.246 • 1.3 mm  - 5.2 mm 2
        where we have assumed that the area reserved for wiring is about 30%.

        REFERENCES
         [1] Agrawal J.P. and Ninan J.: Hardware Modification in Radix-2 Cascade FFT
             Processors, lEEEAcoust., Speech, Signal Processing, Vol. ASSP-26, No. 2, pp.
             171-172, April 1978.
         [2] Akl S.G. and Meijer H.: On the Bit Complexity of Parallel Computations,
             Integration, The VLSI Journal, Vol. 6, pp. 201-212, 1988.
         [3] Bedrij O.J.: Carry-Select Adder, IRE Trans. Elect. Comp. EC-11, pp. 340-346,
             1962.
         [4] Bickerstaff K.C., Schulte M.J., and Swartzlander Jr. E.E.: Parallel Reduced
             Area Multipliers, J. of VLSI Signal Processing, Vol. 9, No. 3, pp. 181-191,
             April 1995.
         [5] Brent R.P. and Rung H.T.: A regular Layout for Parallel Adders, IEEE Trans.
             on Computers, Vol. C-31, pp. 280-284,1982.
         [6] Bull D.R. and Horrocks D.H.: Primitive Operator Digital Filters, IEE Proc. G,
             Vol. 138, No. 3, pp. 401-411, June 1991.
         [7] Biittner M. and SchiiBler H.W.: On Structures for the Implementation of the
             Distributed Arithmetic, Nachrichtentechn. Z., Vol. 29, No. 6, pp. 472-477,1976.
         [8] Callaway T.K. and Swartzlander E.E.: Optimizing Arithmetic Elements for
             Signal Processing, in VLSI Signal Processing V, ed. K. Yao et al., IEEE Pub.,
             New York, pp. 91-100,1992.
         [9] Chan P.K. and Schlag M.D.F.: Analysis and design of CMOS Manchester
             Adders with Variable Carry-Skip, IEEE Trans, on Computers, Vol. C-39, pp.
             983-992,1990.
        [10] Chatterjee A., Roy R.K., and dAbreu M.: Greedy Hardware Optimization for
             Linear Digital Circuits Using Number Splitting and Refactorization, IEEE
             Trans, on Very Large Scale Integration, Vol. 1, No. 4, pp. 423-431, Dec. 1993.
        [11] Croisier A., Esteban D.J., Levilion M.E., and Rizo V: Digital Filter for PCM
             Encoded Signals, U. S. Patent 3777130, Dec. 4,1973.
        [12] De Man H.J., Vandenbulcke C.J., and van Cappellen M.M.: High-Speed
             NMOS Circuits for ROM-Accumulator and Multiplier Type Digital Filters,
             IEEE J. on Solid-State Circuits, Vol. SC-13, No. 5, pp. 565-572, Oct. 1978.
   533   534   535   536   537   538   539   540   541   542   543