Page 211 - Hardware Implementation of Finite-Field Arithmetic
P. 211

m
                             Operations over  GF (2 )—Polynomial Bases      191

               datapath: for i in 1 to M-1 generate
                 new_c(i-1) <= c(i) xor (F(i) and c(0));
               end generate;
               new_c(M-1) <= c(0) xor c(M);
               new_c(2*M-2 downto M) <= ‘0’ & c(2*M-2 downto M+1);
                  The entity declaration of the bit-level Montgomery squaring
               (version 2, Algorithm 7.14) given in the VHDL file montg_comb_
               squarer.vhd is

               entity montgomery_comb_squarer is
               port (
                 a: in std_logic_vector (M-1 downto 0);
                 c: out std_logic_vector (M-1 downto 0)
               );
               end montgomery_comb_squarer;
                  The corresponding VHDL architecture follows:

               cc(0)(0) <= a(0);
               gen_aa: for i in 1 to M-1 generate
                 cc(0)(2*I-1)<= ‘0’; cc(0)(2*I) <= a(i);
               end generate;
               forg: for i in 0 to M-1 generate
                 cell: montg_sq_c_cell port map (C => cc(i), new_c =>
                 cc(i+1));
               end generate;
               c <= cc(M)(M-1 downto 0);
               A sequential VHDL model is also given in the file montgomery_
               square.vhd that is available at www.arithmetic-circuits.org.
                  MSB-first and LSB-first methods for squaring can also be modified
               using Eq. (7.33). For example, LSB-first squaring can be computed as
                                                                       m
               follows [JSP98]. Let a(x) = a  x m − 1  + a  x m − 2  +  . . .  + a x + a  ∈ GF(2 ).
                                     m − 1    m − 2        1    0
                        2
               Then W = a  mod f(x) can be computed in an LSB-first manner as:
                      W =  a =  a  x  2 m−2  + ... +  a x +  a
                           2
                                              2
                               m−1           1    0
                                     )/ ⎥
                                  ( ⎢
                                        +
                        =  ⎡ ⎢ ⎣ a ⎢ ⎣ ( m−12 ⎥ ⎦ x 2 (m−12 ⎦ ... + ax +  a 0 ⎤ ⎥ ⎦  (7.34)
                                  ⎣
                                                2
                                              1
                              /
                              )
                          + a     ⎡ x  ⎣ ⎢ 2 m(  −1 2 ⎦ ⎤ ... + a  x (  2m − 4 2
                                             +
                                        /
                                        )
                                                           x x )
                                          x
                                         ⎥ 2
                                            ⎦
                                 1
                               )/
                            ⎢ ⎢ ⎣ (m−12⎥ ⎦ + ⎣      m −1
               where the least-significant bits of the operand a(x) are processed first.
               The basic computations that can be performed in parallel in step k are
               the following:
                                        x mod
                               a  k ()  =  a  k ( −1 ) 2  f x ()
                                                                    (7.35)
                              W  k ( )  =  W  k ( −1 )  +  a  k ( −1 ) a ⎡ ⎢ ⎢ m/2⎤+−k 1
                                                  ⎥
   206   207   208   209   210   211   212   213   214   215   216