Page 127 - Hardware Implementation of Finite-Field Arithmetic

P. 127

110 Cha pte r F o u r

with sel_ac select next_c <= ZERO when ‘0’, d when
others;
registers_ac: process(clk)
begin
if clk’event and clk = ‘1’ then
if ce_ac = ‘1’ then a <= next_a; c <= next_c; end if;
end if;
end process registers_ac;
registers_bd: process(clk)
begin
if clk’event and clk = ‘1’ then
if ce_bd = ‘1’ then b <= next_b; d <= next_d; end if;
end if;
end process registers_bd;
The complete model additionally includes registers for storing dif
and min, combinational circuits that compute branching conditions,
such as min < 0, b mod 4 = 0, dif ≤ 0, and so on, and a control unit.

4.4 Fermat’s Little Theorem
Fermat’s little theorem states that if y is a nonzero element of Z , then
p
y p − 1 mod p = 1. In particular, given x in Z , then y(xy p − 2 ) mod p = xy p − 1
p
mod p = x. Thus,
z = xy p − 2 mod p (4.35)

A straightforward modification of Algorithm 3.14 computes
z = xy p − 2 mod p.

Algorithm 4.7—mod p division based on Fermat’s little theorem

e := exp_k;
tx := mp(x, exp_2k);
ty := mp(y, exp_2k);
for i in 1 .. k loop
e := mp(e, e);
if binary_p_minus_2(k-i) = 1 then e := mp(e, ty); end if;
end loop;
e := mp(e, tx);
z := mp(e, 1);
An executable Ada file Fermat_division.adb, including Algorithm
4.7, is available at www.arithmetic-circuits.org.
The corresponding datapath, similar to Fig. 3.10, is shown in Fig. 4.6.
Algorithm 4.7 mainly consists of k executions of a loop that in turn
includes at most two Montgomery products. According to Eq. (3.24)
the computation time of a Montgomery product is about 3kT , so
FA
that the total computation time is approximately equal to
T ≈ 6k T (4.36)
2
FA

122 123 124 125 126 127 128 129 130 131 132