Page 278 - DSP Integrated Circuits
P. 278

6.9 Algorithm Transformations                                        263

            The operation rate is minimized if L = JN(2N - 1) ~ J% N, which corresponds
        to N s ~ 2(2 + 72 W. Hence, by using a large number of PEs, many times larger
        than the filter order, block structures can produce several output values per mul-
        tiply—add cycle.
            An advantage of block processing is that the round-off noise is reduced and
        that overflow oscillations can be avoided in some cases [2]. It is well known that
        conventional structures with low round-off noise have low coefficient sensitivity;
        see Equations (5.20) and (5.21). However, block structures implicitly depend on
        pole-zero cancellations, and the cancellations may not occur in the presence of coef-
        ficient errors. Thus, block structures may have exceedingly high coefficient sensitiv-
        ity [1]. Further, coefficient errors may change the nominal time-invariant filter into
        a time-variant one. Notice also that the numerical properties of the original algo-
        rithm are not retained in block processing and that most algorithms do not have a
        state-space representation numerically equivalent to the original algorithm.
            Block processing is particularly suitable for implementation on processors
        that support vector multiplication. Block processing of FIR and certain IIR filters
        (for example, lattice wave digital filters with branches realized using Richards'
        structures) can be implemented efficiently. The use of a state-space representation
        is particularly suited to digital filters having a lower triangular or quasi-triangu-
        lar state matrix. Such state matrices can be obtained either by an orthogonal sim-
        ilarity transformation of any state matrix or, directly, by using a numerically
        equivalent state-space representation of a wave digital circulator structure. Block
        processing is also suitable for decimation and interpolation of the sample rate.


        6.9.2 Clustered Look-Ahead Pipelining
        In this section we will discuss the clustered look-ahead pipelining technique [16,
        18, 26]. For simplicity we will consider only the recursive part of an algorithm,
        since the nonrecursive parts do not limit the sample rate. Further, we assume that
        the algorithm has only one such part described by





            The sample rate is limited by the time for one multiplication and one addition.
        The corresponding transfer function is






            We modify the transfer function by multiplying both the denominator and
        numerator by the same polynomial:
   273   274   275   276   277   278   279   280   281   282   283