Page 267 - Hardware Implementation of Finite-Field Arithmetic
P. 267

m
                                 Operations over  GF (2 )—Normal Bases      247

                 for i in 0 .. te-1 loop r(i) := yij((i-w(v,k)) mod m,v);
                 end loop;
                 if (m rem 2) = 0 then
                   for i in 0 .. (m/2)-1 loop r(i+m/2) := r(i);
                   end loop;
                 end if;
                 t := m2xvv(t,r);
               end loop;
               c:= m2xvv(c,t);
               where yij_array is defined as an array of bits (0 . . .  m – 1, 1 . . .  m/2)
               used to hold the terms y  defined in Eq. (8.25). An executable Ada file
                                   i,j
               NB_multiplier.adb, including  Algorithm 8.4, is available at www.
               arithmetic-circuits.org.
                  For a bit-parallel implementation of Algorithm 8.4, the y s and
                                                                   i,j
               a b s must be generated using m(m + 1)/2 two-input AND gates and
                i i
               m(m − 1) two-input XOR gates, with corresponding delay T   + T  .
                                                                AND   XOR
               For the addition of  r s and  a b s in order to obtain  c s, a total of
                                  i
                                         i i
                                                              i
               ∑ v − 1  h j  + h v  = C N  − 12  XOR gates will be needed, where the com-
                       ε
                                 )
                           (
                 = j 1
               plexity C = 2( ∑ v − 1 h + ε h + 1)   and e is equal to 1 for m odd, and 0.5
                       N     j=1  j  v
               for m even [RH03a]. If these gates are arranged as a binary tree, then
                                                      ⎡
               the corresponding time complexity will be  ( log  2 (C +  ) 1 − 1⎤ ⎥ )T XOR .
                                                      ⎢
                                                            N
               Since C  is an odd integer,  log C(⎡  +  1)⎤ =⎡ log C ⎤ , thus the overall
                                                        N⎥
                     N                ⎢  2  N   ⎥  ⎢  2
               time complexity [RH03a] of the bit-parallel implementation is
                              + ⎡
               T        =  T AND ⎢ log C ⎤ T  .
                                     N⎥
                  _
                NB multiplier      2    XOR
                  A simple VHDL model for the normal basis multiplication
               algorithm is given in the file NB_multiplier.vhd, available at www.
               arithmetic-circuits.org. This model includes two processes for the
               computation of the product, where the y s must be generated. The
                                                  i,j
               datapath for this generation is shown in Fig. 8.2.
                  The entity declaration of the combinational implementation of
               the normal basis multiplier given in the file NB_multiplier.vhd is
               entity NB_multiplier is
               port (
                 a, b: in std_logic_vector(M-1 downto 0);
                 c: out std_logic_vector(M-1 downto 0)
               );
               end NB_multiplier;
                  The corresponding VHDL architecture follows:
               P1: process(a,b)
                 variable yij: yij_array;
                 variable caux,r,t1: std_logic_vector(M-1 downto 0);
                 variable s,te,aux1: integer;
               begin
                 for i in 0 to m-1 loop
                   for j in 1 to v loop
   262   263   264   265   266   267   268   269   270   271   272