Page 59 - Hardware Implementation of Finite-Field Arithmetic

P. 59

42 Cha pte r T w o

delay is equal to kT + (1 + d − k)T . An upper bound is (d + 1)T ,
MULT FA MULT
and an upper bound of d can be calculated as follows:
X b + X b + . . . + X b + X b < 2 ( b + b + . . . + b + b )
k
s − 1 s − 1 s − 2 s − 2 1 1 0 0 s − 1 s − 2 1 0
< s2 = (n/k)2 2k
2k
so that an upper bound of d is 2k + log n − log k. Assuming that the
2 2
inner loop of Algorithm 2.6 is executed only once (best case), the total
computation time is about
T ≈ s(2k + log n − log k + 1)T ≈ 2nT (2.36)
2 2 MULT MULT
A VHDL file precomputation_reducer.vhd is available at www.
arithmetic-circuits.org. The corresponding entity declaration is
entity precomputation_reducer is
port (
x: in std_logic_vector (N-1 downto 0);
m: in std_logic_vector(K-1 downto 0);
clk, reset, start: in std_logic;
z: out std_logic_vector (K-1 downto 0);
done: out std_logic
);
end precomputation_reducer;

The VHDL architecture corresponding to the circuit of Fig. 2.7 is
the following:

digit_selection: for i in 0 to s-1 generate
vector_r(i) <= r((i+1)*K-1 downto i*K);
end generate;
vector_i <= vector_r(conv_integer(sel));
b_i <= b_table(conv_integer(sel));
product <= vector_i * b_i;
next_acc <= acc+product;
register_acc: process(reset, clk)
begin
if reset = ‘1’ then acc <= (others => ‘0’);
elsif clk’event and clk = ‘1’ then
if load = ‘1’ or reload = ‘1’ then acc <= (others => ‘0’);
elsif ce_acc = ‘1’ then acc <= next_acc;
end if;
end if;
end process register_acc;
register_r: process(reset, clk)
begin
if reset = ‘1’ then r <= (others => ‘0’);
elsif clk’event and clk = ‘1’ then
if load = ‘1’ then r <= x;
elsif reload = ‘1’ then r <= ZERO & acc;
end if;
end if;

54 55 56 57 58 59 60 61 62 63 64