Page 48 - Hardware Implementation of Finite-Field Arithmetic
P. 48

mod  m  Reduction    31


               be the 4-bit naturals corresponding to the four most significant bits of
               s ’ and s ’, and compute t = s ’ + s ’ mod 16. Then the value of q can be
                s     c               s   c
               defined as a function of t: q = 1 if t ≤ 2, q = − 1 if 11 ≤ t ≤ 14, q = 0 if
               t = 15, and t never belongs to the interval 3 ≤ t ≤ 10.


               Algorithm 2.3 —Computation of q-SRT algorithm with stored-carry encoding
               function quotient(ss, sc, y: in natural) return integer is
                 ss_high, sc_high, t: natural;
               begin
                 ss_high := ss / (2**(n-1));
                 sc_high := sc / (2**(n-1));
                 t := (ss_high+sc_high) mod 16;
                 if t <= 2 then return 1;
                 elsif t < 15 then return -1;
                 else return 0;
                 end if;
               end quotient;
               An executable Ada file srt_reducer.adb including Algorithm 2.2 as
               well as the functions csa_sum, csa_carry, and quotient (Algorithm 2.3)
               is available at www.arithmetic-circuits.org.
                  The datapath corresponding to Algorithm 2.2, with the quotient
               function defined by Algorithm 2.3, is shown in Fig. 2.4. The Boolean
               functions defining q (2’s complement) are


                              q =  t ⊕  tt t  q =  tt t             (2.18)
                                       21 0
                                   3
                               1
                                                   21 0
                                               0
                  The minimum period of the clock signal is equal to the delay of a
               4-bit adder, plus the computation time of a 4-input Boolean function,
               plus the delay of a 1-bit adder, that is, 5T  + T  . Assuming that T
                                                FA   Boolean4          FA
               and T     have the same order of magnitude, the minimum clock
                    Boolean4
               period is approximately equal to
                                      T   ≈ 6T                      (2.19)
                                       CLK   FA
                  A lower bound of the total computation time, including the final
               steps (decoding from stored-carry form to normal form and correction
               if z < 0), is given by


                   T = (n − k + 1)T   + (k + 1)T  ≈ 6(n − k + 1)T  + (k + 1)T  (2.20)
                               CLK       FA            FA       FA
               that is a linear function of n and k instead of a quadratic function, as
               in the case of Algorithm 2.1 [Eq. (2.14)]. In Eq. (2.20) the term (k + 1)T
                                                                       FA
               corresponds to the final operations; they are not executable in one
               clock cycle.
   43   44   45   46   47   48   49   50   51   52   53