Page 379 - DSP Integrated Circuits
P. 379

364                                              Chapter 8 DSP Architectures


        RAM and 64k x 24-bit program memory. The DSP56002 is available in a 132-pin
        pin—grid array package or plastic-quad flat-pack (PQFP).
            The Motorola 56002 has three functional units: (a) multiply-accumulate unit,
        (b) address generation unit, and (c) program control unit. This architecture makes
        it possible to perform two memory operations concurrently: a multiplication and an
        addition. The address generation unit is responsible for calculating the different
        addresses and contains its own set of registers (the address registers of the DSP).
        The ALUs used for address calculations can only perform the simple addition oper-
        ations, but can also support modulo and bit-reversed address calculations. The pro-
        gram control unit is responsible for fetching and decoding instructions, while at the
        same time containing different control-registers such as the program counter, the
        stack pointer, and the status register. The program control unit also contains the
        hardware to support low-overhead looping and interrupt handling.
                                      1
            Some DSP chips use a VLIW  with separate fields in the instruction control
        word to control the different units. For example, the Motorola DSP56001 and
        DSP56002 perform an instruction prefetch using a three-stage instruction pipe-
        line, a 24 x 24-bit multiply, a 56-bit addition, two data moves, and two address
        pointer updates in a single instruction cycle. The DSP56002 can perform up to six
        operations simultaneously, which corresponds to a peak performance of 120 MOPS
        @ 40 MHz. Typical power consumption is less than 0.5 W.
            Typical performance of the DSP56002 is that an JV-tap FIR filter requires N+7
        clock cycles while N second-order sections in direct form II require 5N+1 clock
        cycles. A 91-tap FIR filter can be executed in 4.9 us and a sixth-order filter in 0.8
        us @ 40 MHz. A 256-point and a 1024-point FFT, without bit-reversal, take 0.78
        and 3.89 ms, respectively.
            The DSP56002 is an enhanced, software-compatible, version of the DSP56001.
        The current versions have clock frequencies of 40 to 80 MHz. A low-power version,
        DSP56L002, that operates with a power supply voltage of 3.3 V is also available.
        The main new features, compared to the DSP56001, are an on-chip emulator, a
        phase-locked loop (PLL) clock oscillator, improved external memory interface, and
        some new instructions. The new instructions include INC (increment) and DEC
        (decrement), double-precision multiply, and debug instructions. The latter makes
        it possible to examine registers, memory, and on-chip peripherals during program
        development.


        8.3.7 Motorola DSP96001™ and DSP96002™
        The DSP96001 is a floating-point version of the DSP56001 aimed at multimedia
        applications. It can be used as a stand-alone device or in multiprocessor configura-
        tions. The clock frequency of the current version is 40 MHz and the instruction
        cycle time 50 ns with a 40-MFLOP peak performance. The Motorola DSP96001 has
        three functional units operating in parallel: (a) IEEE 32-bit floating-point multiply-
        and-accumulate unit, (b) address generation unit, and (c) program control unit. In
        addition, two channels of DMA, six on-chip memories, and various external inter-



        1
        - VLIW (Very Long Instruction Word): A parallel architecture that uses multiple, independent
           functional units and packages multiple operations into one very long instruction. The
           instruction has a field for each PE.
   374   375   376   377   378   379   380   381   382   383   384