Page 313 - DSP Integrated Circuits
P. 313
298 Chapter 7 DSP System Design
period is three time units. We need two
multipliers and one adder. It is not possi-
ble to reschedule operations within the
loop to further reduce the number of
resources. The operations are divided into
two sections so that the critical paths are
broken into pieces of lengths one and two.
The loop is folded as shown in Figures
7.16 and 7.17.
There are two main options: We can
either minimize loop time (sample period)
or minimize the resources. Figure 7.16
illustrates the first case—the loop time is
decreased to two.
In the second case, illustrated in
Figure 7.17, the number of PEs is
minimized. Only one multiplier and one
adder are required. The drawback is that
latency from input to output has been
increased from the original three to six.
Unfortunately, excessive delay is a major
problem in many signal processing
applications. Loop-folding is equivalent
to recognizing that some operations
belonging to an interval or iteration can
be started earlier than others and be
executed at the same time as operations
from previous intervals. We do not have
to execute all operations in an interval
before we start to execute the operations
belonging the next interval. This
resembles scheduling a cyclic graph
instead of a DAG. Loop-unfolding is a
simplified version of the more general
cyclic scheduling formulation, which is
described in the next section.
7.5.4 Cyclic Scheduling Formulation
In the single interval and blocking formulations we did not take into account the
fact that the schedule is inherently periodic. It is therefore convenient to connect K
computation graphs in a closed loop, as shown in Figure 7.18 [24]. This will result
in a periodic scheduling formulation that allows a schedule period that is K times
longer then the sample interval. This cyclic formulation will under certain
conditions result in a maximally fast schedule. This is the most general scheduling
formulation and it can be used to find resource-optimal schedules using arbitrary
constraints on, for example, sample rate and latency.