Page 74 - Hardware Implementation of Finite-Field Arithmetic
P. 74

mod  m  Reduction    57


                                         ik
               2.7.4  Precomputation of 2  mod m
               The number of cycles depends on the value of  x. The cost and
               minimum period of several reducers are shown in Table 2.7.



                N       K       FFs    LUTs     Mult    Slices  Period
                16      8       36     84       1       44      10.4
                24      8       45     105      1       55      10.4
                32      8       55     115      1       61      10.4
                64      8       92     171      1       93      10.8
                64      32      103    233      3       139     22.0
                128     64      205    583      10      339     24.0
                256     128     408    1,643    36      972     34.3
                512     256                             ∞

               TABLE 2.7  Cost and Period of Reducers with Precomputation


                  In order to get an estimation of the computation time, 1,000 pairs
               (x,  m) have been generated for several values of  k and  n, and the
               corresponding numbers of cycles have been observed by simulation.
               The minimum (MinCycles), maximum (MaxCycles), and average
               (AverCycles) numbers of cycles have been obtained, and the average
               computation time (AverTime) has been computed (Table 2.8).



                N   K   Period  MinCycles  MaxCycles  AverCycles  AverTime
                16  8   10.4   6         18         12.01      125
                64  8   10.8   22        52         37.21      401.9

               TABLE 2.8  Average Delay of Reducers with Precomputation


                  Observe that if n = 2k, and thus s = 2, the look-up table only stores
                           k
               b  = 1 and b  = 2  mod m. It is no longer necessary to use multipliers for
                0       1
               computing  vector_r(sel)  .  b . Specific combinational circuits can be
                                      sel
                                                             64
                                                        192
               used. As an example, a 384-bit to 192-bit mod (2  − 2  − 1) reducer
                                                             64
                                                        192
                                                192
                                                                    64
               has been implemented. In this case b  = 2  mod (2  − 2  − 1) = 2  + 1.
                                             1
               The implementation results are the following.
          FFs  LUTs   Slices  Period  MinCycles  MaxCycles  AverCycles  AverTime
          674  16,574  8,559  46.9  10      10        10        469
   69   70   71   72   73   74   75   76   77   78   79