Page 127 - Hardware Implementation of Finite-Field Arithmetic
P. 127

110    Cha pte r  F o u r


               with sel_ac select next_c <= ZERO when ‘0’, d when
                 others;
               registers_ac: process(clk)
               begin
                if clk’event and clk = ‘1’ then
                 if ce_ac = ‘1’ then a <= next_a; c <= next_c; end if;
                end if;
               end process registers_ac;
               registers_bd: process(clk)
               begin
                if clk’event and clk = ‘1’ then
                 if ce_bd = ‘1’ then b <= next_b; d <= next_d; end if;
                end if;
               end process registers_bd;
                  The complete model additionally includes registers for storing dif
               and min, combinational circuits that compute branching conditions,
               such as min < 0, b mod 4 = 0, dif ≤ 0, and so on, and a control unit.


          4.4 Fermat’s Little Theorem
               Fermat’s little theorem states that if y is a nonzero element of Z , then
                                                                   p
               y p − 1  mod p = 1. In particular, given x in Z , then y(xy p − 2 ) mod p = xy p − 1
                                                 p
               mod p = x. Thus,
                                    z = xy p − 2  mod p             (4.35)

                  A straightforward modification of  Algorithm 3.14 computes
               z = xy p − 2  mod p.

               Algorithm 4.7—mod p division based on Fermat’s little theorem

               e := exp_k;
               tx := mp(x, exp_2k);
               ty := mp(y, exp_2k);
               for i in 1 .. k loop
                e := mp(e, e);
                if binary_p_minus_2(k-i) = 1 then e := mp(e, ty); end if;
               end loop;
               e := mp(e, tx);
               z := mp(e, 1);
                  An executable Ada file Fermat_division.adb, including Algorithm
               4.7, is available at www.arithmetic-circuits.org.
                  The corresponding datapath, similar to Fig. 3.10, is shown in Fig. 4.6.
                  Algorithm 4.7 mainly consists of k executions of a loop that in turn
               includes at most two Montgomery products. According to Eq. (3.24)
               the computation time of a Montgomery product is about 3kT , so
                                                                    FA
               that the total computation time is approximately equal to
                                      T ≈ 6k T                      (4.36)
                                           2
                                             FA
   122   123   124   125   126   127   128   129   130   131   132