Page 391 - DSP Integrated Circuits
P. 391

376                                               Chapters DSP Architectures

        and pass them along to PEs on the right and bottom, respectively. In each time
        step the PEs compute the sum of products:


            The result of the computation is stored in each PE. After the whole matrix
        operation is complete, the result can be shifted out, for example, through the right-
        hand side of the array.




            Locality in communication is an important property in large systems. Pipeline
        structures exploit locality by providing direct communication paths between com-
        municating functional units. The crux of the systolic array approach is to ensure
        that once a data item is brought out from the system memory, it can be used effec-
        tively at each PE it passes. A high computational throughput can therefore be
        obtained by using many memories, each with a modest memory bandwidth. The
        ability to use each input data item a number of times is just one of the many advan-
        tages of a systolic architecture. Other advantages, including modular expandabil-
        ity, are simple and regular data and control flow and use of simple and almost
        homogeneous PEs. Elimination of global broadcasting and fan-in are also charac-
        teristic. However, global clock distribution in n-dimensional arrays is difficult.
            Large systolic arrays usually need a large internal data word length. Arrays
        exploit both pipelining and parallelism, but often only a fraction of the cells in an
        array are active in each time slot. One-dimensional arrays for convolution are
        characterized by the fact that their I/O bandwidth requirement is independent of
        the size of the convolution kernel. This contrasts with other types of 2-D arrays,
        for which the I/O bandwidth increases as the kernel increases. For adaptive convo-
        lution kernels, the critical factor is the adaption time constant, i.e., the rate at
        which the kernel can be modified. It may be difficult to modify the kernel without
        disturbing the ongoing convolution process. Communication bandwidth between
        the system memory and the systolic array is a limiting factor.



        8.8 WAVE FRONT ARRAYS

        A wave front array is an n-dimensional structural pipeline with asynchronous com-
        munication between the PEs. The principle is illustrated in Figure 8.23. The main
        difference between the systolic array and the wave front array is that communica-
        tion between PEs in the latter case is maintained by a handshaking protocol.
            The program memory issues a sequence of instructions and the memories pro-
        vide data requested by the top row and leftmost column of PEs. As data and
        instructions become available to the PEs, the corresponding operations are exe-
        cuted and the results passed on to the neighbors. The operations for a given set of
        indices propagate diagonally downward like a plane wave. Operations in this
        architecture are controlled by the availability of data and not by a clock signal.
        Hence, the wave front array can accommodate more irregular algorithms than the
        systolic array, such as algorithms where the execution time in the PEs is data
        dependent.
   386   387   388   389   390   391   392   393   394   395   396