Page 384 - DSP Integrated Circuits
P. 384

8.4 Ideal DSP Architectures                                          369


        which in turn releases the self-timed module. When the module has finished its
        task, it issues the Done signal. The Done signal can then act as an Enable signal so
        that the host can initiate the next transaction [8].
            A problem with asynchronous and self-timed systems is that the time differ-
        ence between the Enable and Done signals may be so small, or even negative, that
        the handshaking malfunctions. Also noise in the signals may cause errors or put
        the latches into a metastable state where neither of the outputs are within their
        low or high voltage ranges. In theory, the latch may stay in this undefined state
        indefinitely.

        8.4.7 Autonomous Bit-Serial PEs

        A problem associated with bit-serial PEs is their high clock frequency—e.g., 200 to
        700 MHz. It is not possible to feed these clocks into a chip through the external
        pins, so they must be generated internally.
            One approach is to let
        each PE generate its own clock
        by using a private ring
        counter as illustrated in Fig-
        ure 8.12. The execution of an
        operation begins by loading an
        appropriate number of input
        data into the cache memory of
        the PE—i.e., into a set of shift
        registers. When the PE has
        received all input data from
        the memories, the clock is
        started, and the PE released
        to execute its operation.
        Because of pipelining the
        result of the operation is
        stored into cache memory one
        operation later.
            A bit-serial PE requires a
        fixed number of clock cycles to  Figure 8.12 Self-timed PE with cache memory
        perform its operation. Hence,
        a counter is used to stop the clock after the appropriate number of clock cycles
        [15]. The requirement on the PE is that it completes its operation within the spec-
        ified time limit. Hence, the PE need only meet a minimum speed requirement. The
        PE operates asynchronously using its local clock, and when it has put the result of
        the operation into cache memory it stops and waits for the next set of input data. A
        faster PE will not improve the throughput, but more importantly it will not con-
        sume more power, since the power consumption is proportional to the number of
        clock cycles per operation. Further, the design problem becomes much easier.
        Instead of designing a high-speed circuit with a well-defined speed, it is sufficient
        to design a circuit that is sufficiently fast. The ring counter is designed so that the
        clock frequency is guaranteed to be lower than the maximal operating frequency of
        the PE. Speed variations due to process variations can be minimized if the ring
        counter and PE are placed close to each other on the chip.
   379   380   381   382   383   384   385   386   387   388   389