Page 538 - DSP Integrated Circuits
P. 538
References 523
structure, the pixels that are multiplied by the same coefficient are added (or sub-
tracted). This reduces the number of terms in the remaining inner products by
N
/2
50%. The chip area is thereby reduced from O(2 ) to O(2^ ), which is a significant
reduction. In comparison, the area for the bit-serial adders is insignificant. Figure
11.58 shows a block diagram for the DCT PE.
A 2-D DCT for 16 x 16 pixels can be built using only one 1-D DCT PE which
itself consists of 16 distributed arithmetic units with N = 8. The TSPC-based
shift-accumulator in Figure 11.51 can be used to implement a distributed arith-
metic unit. The length of the shift-accumulator depends on the word length,
WROM, which depends on the coefficients in the vector-products. In this case we
assume that WRQM = W c +1 = 12 bits. The ROM corresponding to each bit-slice is
3
organized to have eight rows and 2^~ columns in order to have about the same
width as the bit-slices.
The area for a 1-D DCT PE is estimated to
2
A DCT « 16 A DA +A Wire » 16 • 0.246 • 1.3 mm - 5.2 mm 2
where we have assumed that the area reserved for wiring is about 30%.
REFERENCES
[1] Agrawal J.P. and Ninan J.: Hardware Modification in Radix-2 Cascade FFT
Processors, lEEEAcoust., Speech, Signal Processing, Vol. ASSP-26, No. 2, pp.
171-172, April 1978.
[2] Akl S.G. and Meijer H.: On the Bit Complexity of Parallel Computations,
Integration, The VLSI Journal, Vol. 6, pp. 201-212, 1988.
[3] Bedrij O.J.: Carry-Select Adder, IRE Trans. Elect. Comp. EC-11, pp. 340-346,
1962.
[4] Bickerstaff K.C., Schulte M.J., and Swartzlander Jr. E.E.: Parallel Reduced
Area Multipliers, J. of VLSI Signal Processing, Vol. 9, No. 3, pp. 181-191,
April 1995.
[5] Brent R.P. and Rung H.T.: A regular Layout for Parallel Adders, IEEE Trans.
on Computers, Vol. C-31, pp. 280-284,1982.
[6] Bull D.R. and Horrocks D.H.: Primitive Operator Digital Filters, IEE Proc. G,
Vol. 138, No. 3, pp. 401-411, June 1991.
[7] Biittner M. and SchiiBler H.W.: On Structures for the Implementation of the
Distributed Arithmetic, Nachrichtentechn. Z., Vol. 29, No. 6, pp. 472-477,1976.
[8] Callaway T.K. and Swartzlander E.E.: Optimizing Arithmetic Elements for
Signal Processing, in VLSI Signal Processing V, ed. K. Yao et al., IEEE Pub.,
New York, pp. 91-100,1992.
[9] Chan P.K. and Schlag M.D.F.: Analysis and design of CMOS Manchester
Adders with Variable Carry-Skip, IEEE Trans, on Computers, Vol. C-39, pp.
983-992,1990.
[10] Chatterjee A., Roy R.K., and dAbreu M.: Greedy Hardware Optimization for
Linear Digital Circuits Using Number Splitting and Refactorization, IEEE
Trans, on Very Large Scale Integration, Vol. 1, No. 4, pp. 423-431, Dec. 1993.
[11] Croisier A., Esteban D.J., Levilion M.E., and Rizo V: Digital Filter for PCM
Encoded Signals, U. S. Patent 3777130, Dec. 4,1973.
[12] De Man H.J., Vandenbulcke C.J., and van Cappellen M.M.: High-Speed
NMOS Circuits for ROM-Accumulator and Multiplier Type Digital Filters,
IEEE J. on Solid-State Circuits, Vol. SC-13, No. 5, pp. 565-572, Oct. 1978.

