Page 97 - Hardware Implementation of Finite-Field Arithmetic
P. 97
80 Cha pte r T h ree
adders can be used. The final steps, that is, p = p + p and z equal to either
c s
p or p − m, are missing.
The minimum clock period of the circuit of Fig. 3.9 is approxi-
mately 2T . The number of clock cycles is equal to k so that the total
FA
computation time, without the final operations, is about 2kT . The
FA
final steps consist of computing p = p + p and p − m, where p and p
c s c s
are (k + 1)-bit numbers and m a k-bit number. If carry-propagate
adders are used, the corresponding computation time is about kT .
FA
Thus, the total computation time is approximately equal to
T ≈ 2kT + kT (3.24)
FA FA
In Eq. (3.24) the second term kT corresponds to the final
FA
operations which are not executable in one clock cycle.
A complete VHDL file Montgomery_multiplier.vhd is available
at www.arithmetic-circuits.org. The entity declaration is
entity Montgomery_multiplier is
port (
x, y: in std_logic_vector(K-1 downto 0);
clk, reset, start: in std_logic;
z: out std_logic_vector(K-1 downto 0);
done: out std_logic
);
end Montgomery_multiplier;
The VHDL architecture corresponding to the circuit of Fig. 3.9 is
the following:
and_gates: for i in 0 to k-1 generate
y_by_xi(i) <= y(i) and xi;
end generate;
y_by_xi(K) <= ‘0’;
first_csa: for i in 0 to k generate
as(i) <= pc(i) xor ps(i) xor y_by_xi(i);
ac(i+1) <= (pc(i) and ps(i)) or (pc(i) and y_by_xi(i))
or (ps(i) and y_by_xi(i));
end generate;
ac(0) <= ‘0’; as(K+1) <= ‘0’;
long_m <= “00”&m;
second_csa: for i in 0 to k generate
bs(i) <= ac(i) xor as(i) xor long_m(i);
bc(i+1) <= (ac(i) and as(i)) or (ac(i) and long_m(i))
or (as(i) and long_m(i));
end generate;
bc(0) <= ‘0’; bs(K+1) <= ac(K+1);
half_as <= as(K+1 downto 1); half_ac <= ac(K+1 downto 1);
half_bs <= bs(K+1 downto 1); half_bc <= bc(K+1 downto 1);
with as(0) select next_pc <= half_ac when ‘0’, half_bc
when others;
with as(0) select next_ps <= half_as when ‘0’, half_bs when
others;