Page 375 - DSP Integrated Circuits
P. 375

360                                              Chapter 8 DSP Architectures


            Most DSP routines are fairly small and can reside either in the on-chip pro-
        gram memory or in a small external EPROM. Most DSP routines are hand-coded
        to fully exploit the DSPs architectural features [6]—for example, to reduce the
        inefficiencies associated with flushing long pipelines, unwinding short loops and
        interleaving several independent loops to keep the pipelines full [22].
            Branch instructions waste processing time in heavily pipelined architectures.
        The typical solution is to use dedicated hardware that allows zero-overhead loop.
        However, hardware support is often provided for only one loop due to the cost.


        8.3.1 Harvard Architecture
        In the classical von Neumann architecture the ALU and the control unit are con-
        nected to a single memory that stores both the data values and the program
        instructions. During execution, an instruction is read from the memory and
        decoded, appropriate operands are fetched from the memory, and, finally, the
        instruction is executed. The main disadvantage is that memory bandwidth
        becomes the bottleneck in such an architecture.
            The most common operation a standard DSP processor must be able to per-
        form efficiently is multiply-and-accumulate. This operation should ideally be per-
        formed in a single instruction cycle. This means that two values must be read from
        memory and (depending on organization) one value must be written, or two or
        more address registers must be updated, in that cycle. Hence, a high memory
        bandwidth is just as important as a fast multiply-and-accumulate operation.
            Several memory buses
        and on-chip memories are
        therefore used so that
        reads and writes to differ-
        ent memory units can take
        place concurrently. Fur-
        thermore, pipelining is
        used    extensively   to
        increase the throughput.
        Two separate memories
        are used in the classical
        Harvard architecture as
        shown in Figure 8.5. One of
        the memories is used
        exclusively for data while
        the other is used for instructions. The Harvard architecture therefore achieves a
        high degree of concurrency. Current DSP architectures use multiple buses and
        execution units to achieve even higher degrees of concurrency. Chips with multiple
        DSP processors and a RISC processor are also available.


        8.3.2 TMS32010™
        The Texas Instruments TMS32010 family [14] of low-cost fixed-point DSPs is based
        on a single-bus architecture as shown in Figure 8.6. The family, introduced in 1983,
        has various members with differing RAM and ROM configurations. The architecture
        has 32-bit registers, one 16—bit wide bus, and serial I/O ports. This architecture
   370   371   372   373   374   375   376   377   378   379   380