Page 53 - Hardware Implementation of Finite-Field Arithmetic
P. 53

36     Cha pte r  T w o


                  Thus, s = 2 and
                                     r = (F7) + (77) = (16E)

                  Now reduce (16E):

                                      (16E) = (1)(100) + (6E)
                                    (1)(11) = (0)(100) + (11)

               so that s = 2 and

                                    r = (6E) + (11) = (7F)
                         k
                  As r < 2  it remains to check whether r < m, or not. Actually (7F) <
               (EF) so that the final result is

                              (41C1D298F81A7296) mod (EF) = (7F)
                  Actually x = (41C1D298F81A7296) = (466F352019D159)(EF) + (7F).

                                k
               Algorithm 2.5—mod 2  − a reduction
               r := x mod 2**k; q := x/2**k; a := 2**k - m;
               loop
                 loop -- main loop
                   r := r + (q*a mod 2**k); q := q*a/2**k;
                   if q = 0 then exit; end if;
                 end loop;
                 q := r/2**k; r := r mod 2**k;
                 if q = 0 then exit; end if;
               end loop;
               --final correction;
               if r >= m then z := r-m; else z := r; end if;

               An executable Ada file two_power_k_minus_a_reducer.adb is available
               at www.arithmetic-circuits.org.
                  The datapath corresponding to Algorithm 2.5 is shown in
               Fig.  2.5. In order to get an estimation of the minimum clock
               period, assume that a carry-save multiplier is used. An example
               is shown in Fig. 2.6 with n = 7, k = 3, and t = 5. Every product-cell
               computes two 4-input Boolean functions, namely the 2-bit repre-
               sentation of xy + z + w where x, y, z, and w are the four cell inputs.
               Let T    be the corresponding delay. Then the computation time
                    MULT
               of product(n − 1..k) is equal to kT  + (n − k)T , and the computa-
                                           MULT        FA
               tion time of sum(t − 1..0) to kT  + (t − k + 1 )T . Thus, the mini-
                                         MULT            FA
               mum period of the clock signal is kT  + (n − k)T . An upper
                                                 MULT        FA
               bound is nT   . Assuming that the inner loop of Algorithm 2.5 is
                         MULT
               executed only once (best case), the total computation time is
               about
   48   49   50   51   52   53   54   55   56   57   58