Page 391 - DSP Integrated Circuits

P. 391

376 Chapters DSP Architectures

and pass them along to PEs on the right and bottom, respectively. In each time
step the PEs compute the sum of products:

The result of the computation is stored in each PE. After the whole matrix
operation is complete, the result can be shifted out, for example, through the right-
hand side of the array.

Locality in communication is an important property in large systems. Pipeline
structures exploit locality by providing direct communication paths between com-
municating functional units. The crux of the systolic array approach is to ensure
that once a data item is brought out from the system memory, it can be used effec-
tively at each PE it passes. A high computational throughput can therefore be
obtained by using many memories, each with a modest memory bandwidth. The
ability to use each input data item a number of times is just one of the many advan-
tages of a systolic architecture. Other advantages, including modular expandabil-
ity, are simple and regular data and control flow and use of simple and almost
homogeneous PEs. Elimination of global broadcasting and fan-in are also charac-
teristic. However, global clock distribution in n-dimensional arrays is difficult.
Large systolic arrays usually need a large internal data word length. Arrays
exploit both pipelining and parallelism, but often only a fraction of the cells in an
array are active in each time slot. One-dimensional arrays for convolution are
characterized by the fact that their I/O bandwidth requirement is independent of
the size of the convolution kernel. This contrasts with other types of 2-D arrays,
for which the I/O bandwidth increases as the kernel increases. For adaptive convo-
lution kernels, the critical factor is the adaption time constant, i.e., the rate at
which the kernel can be modified. It may be difficult to modify the kernel without
disturbing the ongoing convolution process. Communication bandwidth between
the system memory and the systolic array is a limiting factor.

8.8 WAVE FRONT ARRAYS

A wave front array is an n-dimensional structural pipeline with asynchronous com-
munication between the PEs. The principle is illustrated in Figure 8.23. The main
difference between the systolic array and the wave front array is that communica-
tion between PEs in the latter case is maintained by a handshaking protocol.
The program memory issues a sequence of instructions and the memories pro-
vide data requested by the top row and leftmost column of PEs. As data and
instructions become available to the PEs, the corresponding operations are exe-
cuted and the results passed on to the neighbors. The operations for a given set of
indices propagate diagonally downward like a plane wave. Operations in this
architecture are controlled by the availability of data and not by a clock signal.
Hence, the wave front array can accommodate more irregular algorithms than the
systolic array, such as algorithms where the execution time in the PEs is data
dependent.

386 387 388 389 390 391 392 393 394 395 396