Page 87 - Hardware Implementation of Finite-Field Arithmetic
P. 87

70    Cha pte r  T h ree


               control_unit: process(clk, reset)
               begin
                 case current_state is
                   when 0 to 1 => done <= ‘1’; start2 <= ‘0’;
                   when 2 => done <= ‘0’; start2 <= ‘0’;
                   when 3 => done <= ‘0’; start2 <= ‘1’;
                   when 4 => done <= ‘0’; start2 <= ‘0’;
                 end case;
                 start1 <= start;
                 if reset = ‘1’ then current_state <= 0;
                 elsif clk’event and clk = ‘1’ then
                   case current_state is
                   when 0 => if start = ‘0’ then current_state <= 1; end if;
                   when 1 => if start = ‘1’ then current_state <= 2;
                     end  if;
                   when 2 => if done1 = ‘1’ then current_state <= 3;
                     end if;
                   when 3 => current_state <= 4;
                   when 4 => if done2 = ‘1’ then current_state <= 0;
                     end if;
                   end case;
                 end if;
               end process;

                  The total computation time is the sum of the computation times
               of both blocks. The minimum clock period of the circuit of Fig. 3.5 is
               approximately equal to  T , and the minimum clock period of the
                                     FA
               circuit of Fig. 2.4 is about 6T  [Eq. (2.19)]. The multiplication takes
                                       FA
               k cycles and the SRT reduction n − k + 1 = k + 1 cycles, plus (k + 1)T
                                                                       FA
               time units for the final steps [Eq. (2.20)]. Thus the total number of
               cycles is approximately equal to 2k, the cycle duration about 6T , and
                                                                   FA
               the total computation time is
                                   T ≈ 12kT  + kT                    (3.7)
                                          FA    FA
                  In Eq. (3.7) the term kT  corresponds to the final operations; they
                                     FA
               are not executable in one clock cycle, and a comment similar to
               Comment 2.1 must be done.


               3.4.2  Double, Add, and Reduce
               Th is section describes circuits based on the Interleaving Multiplication
               Algorithm ([RSDK06]). Given a k-bit natural x and a natural y the
               product z = xy can be computed as follows:

                                         +
                                                  0
                  xy = (x   · 2 k − 1  + x   · 2 k − 2  . . .  + x  · 2 )y = ( . . . ((0 · 2 + x  y)2
                       k − 1     k − 2         0                 k − 1
                     + x   y)2+  . . .  + x y)2 + x y                (3.8)
                         k − 2      1     0
                  If all operations (addition and doubling) are executed mod m, the
               result is product = xy mod m. The corresponding (left to right) algorithm is
   82   83   84   85   86   87   88   89   90   91   92