Page 398 - DSP Integrated Circuits

P. 398

References 383

A perfectly balanced architecture is obtained if the number of memories is
doubled—i.e., we have an interleaving factor of two. An even faster implementa-
tion is obtained if the PE is pipelined by introducing a set of flip-flops between the
multiplier and the adder and subtractor. This architecture can again be balanced
by increasing the interleaving of the memories by a further factor of two. This is
possible since the FFT is a nonrecursive algorithm.

Cache Memories
A common technique to reduce the communication demand is to provide the pro-
cessors with fast private memories. The processor can therefore access its cache
memory without interference from the other processors. This scheme works well if
the relevant data are kept in the cache memories and if the communication
demands between the cache memories and the main memory are relatively small.
This scheme allows the use of slower and less expensive main memories. We will,
in the examples used as case studies, use cache memories to obtain balanced
architectures.

8.9.4 Large Basic Operations
The third factor in inequality (8.1) affecting architectural balance is execution
time for the PEs. Obviously, if we use PEs with a large granularity, execution time
will be longer, however more useful work will be done. For example, a butterfly PE
is preferred over separate PEs that perform simpler operations such as add, sub-
tract, and multiply. Further, fewer memory transactions may be needed if direct
interprocessor communications are allowed. It is interesting to note that, con-
versely to the current trend of RISC technology, it is advantageous to use basic
operations that are as large as possible, since large basic operations tend to do
more useful work on a given set of input data and thereby tend to reduce the com-
munication requirement.
In Chapter 9 we will show that it is efficient to use more but slower PEs to
obtain a balanced architecture [24, 25].

REFERENCES
[1] Almasi G.S. and Gottlieb A.: Highly Parallel Computing, Benjamin/Cummings,
Redwood City, CA, 1989.
[2] Anceau F: The Architecture of Microprocessors, Addison-Wesley, Wokingham,
England, 1986.
[3] Bhuyan L.N., Yang Q., and Agrawal D.P.: Performance of Multiprocessor
Interconnection Networks, IEEE Computers, Vol. 22, No. 2, pp. 25-37, Feb.
1989.
[4] Chen C.H. (Ed.): Signal Processing Handbook, Marcel Dekker, New York,
1988.
[5] DeCegama A.L.: The Technology of Parallel Processing: Parallel Architectures
and VLSI Hardware, Vol. 1, Prentice Hall, Englewoood Cliffs, New Jersey,
1989.

393 394 395 396 397 398 399 400 401 402 403