Page 294 - DSP Integrated Circuits

P. 294

7.2 A Direct Mapping Technique 279

process. The complexity of the processes (granularity) can vary from
atomic (indivisible) operations (fine granularity) up to complex processes
(coarse granularity). Typically the higher-level processes will be mapped
onto virtual machines, while the lower-level processes will be mapped
onto hardware components such as PEs and memories. The partitioning
of the system into a hierarchy of processes is a crucial design step that
will have a major influence on both the system cost and performance.
The partitioning should be done so that the processes can be mapped
onto an appropriate software-hardware structure. An important issue is
interprocess communication. Generally, it is favorable to partition the
algorithm in such a way that communication between processes is
minimized, since a data transfer between two processes requires the
processes to be synchronized at that point in time. Hence, one of the
processes must wait for the other, and the hardware structure associated
with the first process may become idle.
Some DSP algorithms (for example, digital filters) are usually
described using signal-flow graphs, which depict the data flow in the
algorithm. The partitioning of such systems is relatively simple, since it
mainly involves arithmetic operations, and the number of levels in the
hierarchy is low. However, the signal-flow graph is a less suitable
representation for algorithms that have more complicated control
structures—for example, the FFT. Many algorithms like the FFT that are
described using a high-level programming language (for example, Pascal
or C) contain "for"-loops. The corresponding implementation will
therefore have a control structure that implements a "for"-loop by
repeatedly executing the hardware structure that performs the
operations within the loop. Hence, a loop in the algorithm will generally
be identified as a control process. In some cases, however, it may be
advantageous or necessary to increase the computational parallelism by
unwinding the loop.
4. In practice, most complex DSP algorithms are derived in sequential
form by using an imperative high-level programming language. The
sequential algorithm must therefore be transformed into a more
parallel description. The processes in the parallel description must be
scheduled in time so that the specified performance is met and the
amount of resources required is minimized. Often, only the number of
PEs is minimized, but it is also possible to minimize the memory and
communication resources. In fact, it is possible to minimize the total
amount of hardware as well as the power consumption by proper
scheduling.
5. Synthesis of an optimal architecture starts from the schedule that
determines the number and type of PEs, memories, and communication
channels for a class of optimal architectures. It turns out that the generic
architecture is a shared-memory architecture.
6. Adequate resources are allocated according to the given schedule. The
number of PEs required is determined by the number of concurrent
operations while the number of memories, or memory ports, is
determined by the largest number of simultaneous memory transactions.
This design step is called resource allocation.

289 290 291 292 293 294 295 296 297 298 299