Page 97 - Hardware Implementation of Finite-Field Arithmetic
P. 97

80     Cha pte r  T h ree


               adders can be used. The final steps, that is, p = p  + p  and z equal to either
                                                     c  s
               p or p − m, are missing.
                  The minimum clock period of the circuit of Fig.  3.9 is approxi-
               mately 2T . The number of clock cycles is equal to k so that the total
                       FA
               computation time, without the final operations, is about 2kT . The
                                                                   FA
               final steps consist of computing p = p  + p and p  −  m, where p  and p
                                              c  s                 c    s
               are (k + 1)-bit numbers and m a k-bit number. If carry-propagate
               adders are used, the corresponding computation time is about kT .
                                                                       FA
               Thus, the total computation time is approximately equal to
                                    T ≈ 2kT + kT                    (3.24)
                                          FA   FA
                  In Eq. (3.24) the second term  kT  corresponds to the final
                                                FA
               operations which are not executable in one clock cycle.
                  A complete VHDL file Montgomery_multiplier.vhd is available
               at www.arithmetic-circuits.org. The entity declaration is
               entity Montgomery_multiplier is
               port (
                 x, y: in std_logic_vector(K-1 downto 0);
                 clk, reset, start: in std_logic;
                 z: out std_logic_vector(K-1 downto 0);
                 done: out std_logic
               );
               end Montgomery_multiplier;
                  The VHDL architecture corresponding to the circuit of Fig. 3.9 is
               the following:
               and_gates: for i in 0 to k-1 generate
                 y_by_xi(i) <= y(i) and xi;
               end generate;
               y_by_xi(K) <= ‘0’;
               first_csa: for i in 0 to k generate
                 as(i) <= pc(i) xor ps(i) xor y_by_xi(i);
                 ac(i+1) <= (pc(i) and ps(i)) or (pc(i) and y_by_xi(i))
                   or (ps(i) and y_by_xi(i));
               end generate;
               ac(0) <= ‘0’; as(K+1) <= ‘0’;
               long_m <= “00”&m;
               second_csa: for i in 0 to k generate
                 bs(i) <= ac(i) xor as(i) xor long_m(i);
                 bc(i+1) <= (ac(i) and as(i)) or (ac(i) and long_m(i))
                   or (as(i) and long_m(i));
               end generate;
               bc(0) <= ‘0’; bs(K+1) <= ac(K+1);
               half_as <= as(K+1 downto 1); half_ac <= ac(K+1 downto 1);
               half_bs <= bs(K+1 downto 1); half_bc <= bc(K+1 downto 1);
               with as(0) select next_pc <= half_ac when ‘0’, half_bc
                 when others;
               with as(0) select next_ps <= half_as when ‘0’, half_bs when
                 others;
   92   93   94   95   96   97   98   99   100   101   102