Page 398 - DSP Integrated Circuits
P. 398

References                                                           383


            A perfectly balanced architecture is obtained if the number of memories is
        doubled—i.e., we have an interleaving factor of two. An even faster implementa-
        tion is obtained if the PE is pipelined by introducing a set of flip-flops between the
        multiplier and the adder and subtractor. This architecture can again be balanced
        by increasing the interleaving of the memories by a further factor of two. This is
        possible since the FFT is a nonrecursive algorithm.




        Cache Memories
        A common technique to reduce the communication demand is to provide the pro-
        cessors with fast private memories. The processor can therefore access its cache
        memory without interference from the other processors. This scheme works well if
        the relevant data are kept in the cache memories and if the communication
        demands between the cache memories and the main memory are relatively small.
        This scheme allows the use of slower and less expensive main memories. We will,
        in the examples used as case studies, use cache memories to obtain balanced
        architectures.


        8.9.4 Large Basic Operations
        The third factor in inequality (8.1) affecting architectural balance is execution
        time for the PEs. Obviously, if we use PEs with a large granularity, execution time
        will be longer, however more useful work will be done. For example, a butterfly PE
        is preferred over separate PEs that perform simpler operations such as add, sub-
        tract, and multiply. Further, fewer memory transactions may be needed if direct
        interprocessor communications are allowed. It is interesting to note that, con-
        versely to the current trend of RISC technology, it is advantageous to use basic
        operations that are as large as possible, since large basic operations tend to do
        more useful work on a given set of input data and thereby tend to reduce the com-
        munication requirement.
            In Chapter 9 we will show that it is efficient to use more but slower PEs to
        obtain a balanced architecture [24, 25].


        REFERENCES
         [1] Almasi G.S. and Gottlieb A.: Highly Parallel Computing, Benjamin/Cummings,
             Redwood City, CA, 1989.
         [2] Anceau F: The Architecture of Microprocessors, Addison-Wesley, Wokingham,
             England, 1986.
         [3] Bhuyan L.N., Yang Q., and Agrawal D.P.: Performance of Multiprocessor
             Interconnection Networks, IEEE Computers, Vol. 22, No. 2, pp. 25-37, Feb.
             1989.
         [4] Chen C.H. (Ed.): Signal Processing Handbook, Marcel Dekker, New York,
             1988.
         [5] DeCegama A.L.: The Technology of Parallel Processing: Parallel Architectures
             and VLSI Hardware, Vol. 1, Prentice Hall, Englewoood Cliffs, New Jersey,
             1989.
   393   394   395   396   397   398   399   400   401   402   403