Page 53 - Hardware Implementation of Finite-Field Arithmetic
P. 53
36 Cha pte r T w o
Thus, s = 2 and
r = (F7) + (77) = (16E)
Now reduce (16E):
(16E) = (1)(100) + (6E)
(1)(11) = (0)(100) + (11)
so that s = 2 and
r = (6E) + (11) = (7F)
k
As r < 2 it remains to check whether r < m, or not. Actually (7F) <
(EF) so that the final result is
(41C1D298F81A7296) mod (EF) = (7F)
Actually x = (41C1D298F81A7296) = (466F352019D159)(EF) + (7F).
k
Algorithm 2.5—mod 2 − a reduction
r := x mod 2**k; q := x/2**k; a := 2**k - m;
loop
loop -- main loop
r := r + (q*a mod 2**k); q := q*a/2**k;
if q = 0 then exit; end if;
end loop;
q := r/2**k; r := r mod 2**k;
if q = 0 then exit; end if;
end loop;
--final correction;
if r >= m then z := r-m; else z := r; end if;
An executable Ada file two_power_k_minus_a_reducer.adb is available
at www.arithmetic-circuits.org.
The datapath corresponding to Algorithm 2.5 is shown in
Fig. 2.5. In order to get an estimation of the minimum clock
period, assume that a carry-save multiplier is used. An example
is shown in Fig. 2.6 with n = 7, k = 3, and t = 5. Every product-cell
computes two 4-input Boolean functions, namely the 2-bit repre-
sentation of xy + z + w where x, y, z, and w are the four cell inputs.
Let T be the corresponding delay. Then the computation time
MULT
of product(n − 1..k) is equal to kT + (n − k)T , and the computa-
MULT FA
tion time of sum(t − 1..0) to kT + (t − k + 1 )T . Thus, the mini-
MULT FA
mum period of the clock signal is kT + (n − k)T . An upper
MULT FA
bound is nT . Assuming that the inner loop of Algorithm 2.5 is
MULT
executed only once (best case), the total computation time is
about