Page 340 - DSP Integrated Circuits
P. 340

7.7 FFT Processor, Cont.                                              325

        7.7.1 Scheduling of the Inner Loops

        The processes in the inner loop are shown in Figure 7.54, where the memory pro-
        cesses have been explicitly shown as read and write processes. The precedence
        relationships between the loops has been denoted with dashed arrows. For sched-
        uling purposes we have included the estimated lifetimes (latency) of the processes.
        The lifetimes are measured in units of memory clock cycles, TM = 31.25 ns. We
        assume that the memory and WP processes are pipelined internally so that the
        throughput is one data item per unit of time. Thus, a read or write cycle can be
        performed every memory cycle, but the internal pipeline causes the data item
        to be ready two memory cycles after the corresponding address has been
        applied. More specifically, the memory pipeline latches will be placed within
        the address decoders of the RAMs to allow higher throughput rates. The but-
        terfly processes are internally pipelined to support one computation per four
        memory cycles.




























                            Figure 7.54 Processes in the inner loop



            The inner loop in Figure 7.54 is too time consuming since the loop time is 16
        time units. The FFT computation would take 16 • 5120T M/2 = 1.28 ms. This is too
        long since only 0.32 ms is available. Loop folding has therefore been applied by
        overlapping successive loops, as illustrated in Figure 7.55, so that the loop time
        becomes equal to four memory cycles. The reads and writes for the ASAP schedule
        shown in Figure 7.55 have been denoted by RI to R^ and Wi to W^ while the
        W_Process is denoted with WP.
            At the most two concurrent read and write operations are possible, since we
        have selected to use only two memories. In order to show more clearly how many
        concurrent processes the initial schedule contains, we fold the schedule
        modulo(4)—i.e., modulo the desired loop time. The folded schedule is shown in
        Figure 7.56.
   335   336   337   338   339   340   341   342   343   344   345