Page 393 - DSP Integrated Circuits
P. 393
378 Chapters DSP Architectures
Figure 8.24 Datawave multiprocessor architecture
transfers. The PEs use local clocks derived from the global clock. Figure 8.25
shows the architecture of the PEs. The core of a PE is a 12-bit RISC processor with
local program and data stores. Three 12-bit ring buses are used to connect the PE
core with adjacent PEs via FIFOs.
The two outer buses are used to deliver data to the MAC (multiplier-accumu-
lator) and ALU while the third bus is used to deliver the results to functional units
and the outside world. The MAC has a 12 x 12-bit multiplier and a 29-bit accumu-
lator. The ALU works in parallel with the MAC. Each PE, which is pipelined, can
start a multiply-and-accumulate operation every clock cycle (125 MHz). Hence, a
very high peak performance of 4 GOPS is obtained.
The program memory can store only 64 46-bit-wide instructions, but this is
usually sufficient since the chip is assumed to run very high sample rate applica-
tions. There is time to execute only a few instructions per sample. Program memo-
ries are loaded via a serial bus connected to all PEs. The local data memory is a
four-port register with 16 words. Unfortunately, current technologies do not allow
large on-chip data memories. Such large memories are needed, for example, to
store several lines, or even several frames, of a TV image.