Page 90 - Hardware Implementation of Finite-Field Arithmetic
P. 90

mod  m  Operations    73


                  In order to reduce the computation time [Eq. (3.9)], the stored-
               carry encoding principle can be used. For that,  Algorithm 3.7 is
               modified: p is represented under the form p = p  + p ; the sum p + y is
                                                       s  c
               computed in carry-stored form, that is, p  + p  + y = w  + w (csa_sum
                                                  s  c       s   c
               and csa_carry functions, Sec. 2.1.3); every operation (multiply by 2 or
               sum y) is followed by one step of SRT-reduction (Sec. 2.1.3); the final
               value of p = p  + p  belongs to the interval  − m ≤ p < m, so that the final
                          c  s
               result is either p or p + m.
               Algorithm 3.8—Double, add, and reduce with carry-stored encoding
               ps := 0;  pc := 0;
               for i in 0 .. k-1 loop
                  --doubling
                  ws := (2*ps) mod 2**(k+2); wc := (2*pc) mod 2**(k+2);
                  --SRT reduction
                  if quotient(ws, wc) = 1 then
                    ps := csa_sum(ws, wc, minus_m);
                    pc := csa_carry(ws, wc, minus_m);
                  elsif quotient(ws, wc) = 0 then
                    ps := csa_sum(ws, wc, 0); pc := csa_carry(ws, wc, 0);
                  else ps := csa_sum(ws, wc, m);
                  pc := csa_carry(ws, wc, m);
                  end if;
                  --adding
                  if binary_x(k-i-1) = 1 then
                     ws := csa_sum(ps, pc, y); wc := csa_carry(ps, pc, y);
                     --SRT reduction
                     if quotient(ws, wc) = 1 then
                        ps := csa_sum(ws, wc, minus_m);
                        pc := csa_carry(ws, wc, minus_m);
                     elsif quotient(ws, wc) = 0 then
                        ps := csa_sum(ws, wc, 0);
                        pc := csa_carry(ws, wc, 0);
                     else ps := csa_sum(ws, wc, m);
                     pc := csa_carry(ws, wc, m);
                     end if;
                  end if;
               end loop;
               p := (ps + pc) mod 2**(k+1);
               if p >= 2**k then p := (p + m) mod 2**k; end if;
                  Part of  the corresponding datapath is shown in Fig. 3.8. It must be
               completed by the  k-bit shift register and the combinational circuit
               (generation of condition) of Fig. 3.7, as well as the adders corresponding
               to the final operations, that is, p = p  + p  and p + m if p is negative. The
                                            c  s
               minimum clock period is about 2T , the number of clock cycles is equal
                                           FA
               to 2k, so that the total computation time, without the final operations, is
               approximately equal to 4kT . If carry-propagate adders are used for the
                                     FA
               final operations the total computation time is approximately equal to
                                    T ≈ 4kT  + kT                   (3.10)
                                          FA   FA
   85   86   87   88   89   90   91   92   93   94   95