Page 87 - Hardware Implementation of Finite-Field Arithmetic
P. 87
70 Cha pte r T h ree
control_unit: process(clk, reset)
begin
case current_state is
when 0 to 1 => done <= ‘1’; start2 <= ‘0’;
when 2 => done <= ‘0’; start2 <= ‘0’;
when 3 => done <= ‘0’; start2 <= ‘1’;
when 4 => done <= ‘0’; start2 <= ‘0’;
end case;
start1 <= start;
if reset = ‘1’ then current_state <= 0;
elsif clk’event and clk = ‘1’ then
case current_state is
when 0 => if start = ‘0’ then current_state <= 1; end if;
when 1 => if start = ‘1’ then current_state <= 2;
end if;
when 2 => if done1 = ‘1’ then current_state <= 3;
end if;
when 3 => current_state <= 4;
when 4 => if done2 = ‘1’ then current_state <= 0;
end if;
end case;
end if;
end process;
The total computation time is the sum of the computation times
of both blocks. The minimum clock period of the circuit of Fig. 3.5 is
approximately equal to T , and the minimum clock period of the
FA
circuit of Fig. 2.4 is about 6T [Eq. (2.19)]. The multiplication takes
FA
k cycles and the SRT reduction n − k + 1 = k + 1 cycles, plus (k + 1)T
FA
time units for the final steps [Eq. (2.20)]. Thus the total number of
cycles is approximately equal to 2k, the cycle duration about 6T , and
FA
the total computation time is
T ≈ 12kT + kT (3.7)
FA FA
In Eq. (3.7) the term kT corresponds to the final operations; they
FA
are not executable in one clock cycle, and a comment similar to
Comment 2.1 must be done.
3.4.2 Double, Add, and Reduce
Th is section describes circuits based on the Interleaving Multiplication
Algorithm ([RSDK06]). Given a k-bit natural x and a natural y the
product z = xy can be computed as follows:
+
0
xy = (x · 2 k − 1 + x · 2 k − 2 . . . + x · 2 )y = ( . . . ((0 · 2 + x y)2
k − 1 k − 2 0 k − 1
+ x y)2+ . . . + x y)2 + x y (3.8)
k − 2 1 0
If all operations (addition and doubling) are executed mod m, the
result is product = xy mod m. The corresponding (left to right) algorithm is