Page 393 - DSP Integrated Circuits
P. 393

378                                              Chapters DSP Architectures











































                       Figure 8.24 Datawave multiprocessor architecture



        transfers. The PEs use local clocks derived from the global clock. Figure 8.25
        shows the architecture of the PEs. The core of a PE is a 12-bit RISC processor with
        local program and data stores. Three 12-bit ring buses are used to connect the PE
        core with adjacent PEs via FIFOs.
            The two outer buses are used to deliver data to the MAC (multiplier-accumu-
        lator) and ALU while the third bus is used to deliver the results to functional units
        and the outside world. The MAC has a 12 x 12-bit multiplier and a 29-bit accumu-
        lator. The ALU works in parallel with the MAC. Each PE, which is pipelined, can
        start a multiply-and-accumulate operation every clock cycle (125 MHz). Hence, a
        very high peak performance of 4 GOPS is obtained.
            The program memory can store only 64 46-bit-wide instructions, but this is
        usually sufficient since the chip is assumed to run very high sample rate applica-
        tions. There is time to execute only a few instructions per sample. Program memo-
        ries are loaded via a serial bus connected to all PEs. The local data memory is a
        four-port register with 16 words. Unfortunately, current technologies do not allow
        large on-chip data memories. Such large memories are needed, for example, to
        store several lines, or even several frames, of a TV image.
   388   389   390   391   392   393   394   395   396   397   398