Page 356 - DSP Integrated Circuits
P. 356

7.11 FFT Processor, Cont.                                            341

        and nine memory cells, respectively. Note that the left-edge algorithm does not in
        general lead to minimum memory size. The required access rate for the memories is




           The lifetime will overlap itself in the folded lifetime table if it is longer than
        the scheduling period. In principle, this problem can be solved in either of two
        ways: either by allowing preemption or by duplicating the memory resources used
        to store these values. Preemption of storage requires that values are transferred
        between memory cells. Of course, this is not an efficient scheme.
           The second solution is similar to duplicating resources in an operation sched-
        ule when operations are longer than the scheduling period. The longest storage
        time in this case is only 347^ long. Hence, this is not a problem in this case.


        7.11 FFT PROCESSOR, CONT.

        In this section, we discuss various alternatives in assigning processes to hardware
        resources. In particular we will assign butterfly operations to PEs and data to
        memories. The assignments will differ with respect to complexity of the control
        structures in the architecture and the interconnection network that connects the
        PEs and the memories. In fact, the scheduling, resource allocation, and assign-
        ment steps specify the main features of the corresponding class of optimal archi-
        tectures. We will therefore defer evaluation of the alternatives to Chapter 9.

        7.11.1 Memory Assignment

        The FFT algorithm allows in-place computation—i.e., the results of a butterfly
        operation can always be written into the memory cells that stored the inputs to
        the butterfly. A value in the memory is used as input to one butterfly operation
        only. It is therefore necessary to store only N complex values in the memories.
           Because of the speed requirement we have already decided to use two logical
        memories (RAMs). In practice, large RAMs are implemented as a set of smaller
        memories.
        Memory Assignment 1
        There are many feasible memory assignments for the ST-FFT. It is not practically
        possible to evaluate all assignments. We will here investigate two alternative mem-
        ory assignments for the ST-FFT. The first assigns the first N/2 variables (0 < i < M2)
        to RAMo and the remaining values (M2 < i < N) to RAMi. The assignment for a
        16-point FFT is illustrated in Figure 7.83. However, this assignment requires a
        trivial modification of the schedule shown in Figure 7.57.
           The FFT is a divide-and-conquer algorithm—i.e., the FFT is recursively
        divided into smaller and smaller FFTs. However, the first stage of the FFT differs
        from the other stages. This memory assignment will not cause any difficulties in
        the second to fourth stages of the FFT, since both input values to a butterfly opera-
        tion will be read from the same memory. However, in the first stage, a butterfly
        operation must receive a value from each of the memories. This implies a switching
        function in the interconnection network. The switching will also require some con-
        trol circuitry. These factors will influence the cost of the implementation.
   351   352   353   354   355   356   357   358   359   360   361