Page 189 - Hardware Implementation of Finite-Field Arithmetic
P. 189
170 Cha pte r Se v e n
Then the product given in Eq. (7.9) can be obtained by:
d x () = x M () + x m/2 [ M () + M () + M ()]x + M ( ) () (7.11)
() 1
(1))
() 1
1
() 1
m
x
x
x
x
2 1 0 2 0
The algorithm becomes recursive if it is applied again to the
polynomials given in Eq. (7.10). The next iteration step splits the
polynomials A , B , A , B , (A + A ), and (B + B ) again in half. With
L L H H L H L H
(2)
these newly halved polynomials, new auxiliary polynomials M (x)
can be defined in a similar way to Eq. (7.10). The algorithm eventually
(t)
terminates after t steps. In the final step the polynomials M (x) are
degenerated into single coefficients. Since every step halves the
number of coefficients, the algorithm terminates after t = log m
2
steps.
A VHDL model for the Karatsuba-Ofman multiplication (for m
even) is given in the file Karatsuba_multiplier_even.vhd, which is
available at www.arithmetic-circuits.org. This model includes the
component polynom_multiplier that implements the polynomial
multiplication as given in Section 7.1.1.
entity karatsuba_multiplier_even is
port (
a, b: in std_logic_vector(M-1 downto 0);
d: out std_logic_vector(2*M-2 downto 0)
);
end karatsuba_multiplier_even;
The VHDL architecture is the following:
mult1: polynom_multiplier generic map(M => half_M)
port map(a => a(half_M-1 downto 0),
b => b(half_M-1 downto 0), d=> x0y0);
mult2: polynom_multiplier generic map(M => half_M)
port map(a => a(M-1 downto half_M),
b => b(M-1 downto half_M), d=> x1y1);
mult3: polynom_multiplier generic map(M => half_M)
port map(a => x0_p_X1,
b => y0_p_y1, d=> x01y01);
gen_x0x1y0y1: for i in 0 to half_M-1 generate
x0_p_X1(i) <= a(i) xor a(i + half_M);
y0_p_y1(i) <= b(i) xor b(i + half_M);
end generate;
gen_prod1: for i in 0 to half_M-2 generate
d(half_M+i) <= x01y01(i) xor x0y0(i) xor x1y1(i) xor
x0y0(i+half_M);
end generate;
d(2*half_M-1)<=x01y01(half_M-1) xor x0y0(half_M-1) xor
x1y1(half_M-1);
gen_prod2: for i in half_M to 2*half_M-2 generate
d(half_M+i) <= x01y01(i) xor x0y0(i) xor x1y1(i) xor
x1y1(i-half_M);
end generate;