Page 278 - DSP Integrated Circuits
P. 278
6.9 Algorithm Transformations 263
The operation rate is minimized if L = JN(2N - 1) ~ J% N, which corresponds
to N s ~ 2(2 + 72 W. Hence, by using a large number of PEs, many times larger
than the filter order, block structures can produce several output values per mul-
tiply—add cycle.
An advantage of block processing is that the round-off noise is reduced and
that overflow oscillations can be avoided in some cases [2]. It is well known that
conventional structures with low round-off noise have low coefficient sensitivity;
see Equations (5.20) and (5.21). However, block structures implicitly depend on
pole-zero cancellations, and the cancellations may not occur in the presence of coef-
ficient errors. Thus, block structures may have exceedingly high coefficient sensitiv-
ity [1]. Further, coefficient errors may change the nominal time-invariant filter into
a time-variant one. Notice also that the numerical properties of the original algo-
rithm are not retained in block processing and that most algorithms do not have a
state-space representation numerically equivalent to the original algorithm.
Block processing is particularly suitable for implementation on processors
that support vector multiplication. Block processing of FIR and certain IIR filters
(for example, lattice wave digital filters with branches realized using Richards'
structures) can be implemented efficiently. The use of a state-space representation
is particularly suited to digital filters having a lower triangular or quasi-triangu-
lar state matrix. Such state matrices can be obtained either by an orthogonal sim-
ilarity transformation of any state matrix or, directly, by using a numerically
equivalent state-space representation of a wave digital circulator structure. Block
processing is also suitable for decimation and interpolation of the sample rate.
6.9.2 Clustered Look-Ahead Pipelining
In this section we will discuss the clustered look-ahead pipelining technique [16,
18, 26]. For simplicity we will consider only the recursive part of an algorithm,
since the nonrecursive parts do not limit the sample rate. Further, we assume that
the algorithm has only one such part described by
The sample rate is limited by the time for one multiplication and one addition.
The corresponding transfer function is
We modify the transfer function by multiplying both the denominator and
numerator by the same polynomial: