Page 90 - Hardware Implementation of Finite-Field Arithmetic
P. 90
mod m Operations 73
In order to reduce the computation time [Eq. (3.9)], the stored-
carry encoding principle can be used. For that, Algorithm 3.7 is
modified: p is represented under the form p = p + p ; the sum p + y is
s c
computed in carry-stored form, that is, p + p + y = w + w (csa_sum
s c s c
and csa_carry functions, Sec. 2.1.3); every operation (multiply by 2 or
sum y) is followed by one step of SRT-reduction (Sec. 2.1.3); the final
value of p = p + p belongs to the interval − m ≤ p < m, so that the final
c s
result is either p or p + m.
Algorithm 3.8—Double, add, and reduce with carry-stored encoding
ps := 0; pc := 0;
for i in 0 .. k-1 loop
--doubling
ws := (2*ps) mod 2**(k+2); wc := (2*pc) mod 2**(k+2);
--SRT reduction
if quotient(ws, wc) = 1 then
ps := csa_sum(ws, wc, minus_m);
pc := csa_carry(ws, wc, minus_m);
elsif quotient(ws, wc) = 0 then
ps := csa_sum(ws, wc, 0); pc := csa_carry(ws, wc, 0);
else ps := csa_sum(ws, wc, m);
pc := csa_carry(ws, wc, m);
end if;
--adding
if binary_x(k-i-1) = 1 then
ws := csa_sum(ps, pc, y); wc := csa_carry(ps, pc, y);
--SRT reduction
if quotient(ws, wc) = 1 then
ps := csa_sum(ws, wc, minus_m);
pc := csa_carry(ws, wc, minus_m);
elsif quotient(ws, wc) = 0 then
ps := csa_sum(ws, wc, 0);
pc := csa_carry(ws, wc, 0);
else ps := csa_sum(ws, wc, m);
pc := csa_carry(ws, wc, m);
end if;
end if;
end loop;
p := (ps + pc) mod 2**(k+1);
if p >= 2**k then p := (p + m) mod 2**k; end if;
Part of the corresponding datapath is shown in Fig. 3.8. It must be
completed by the k-bit shift register and the combinational circuit
(generation of condition) of Fig. 3.7, as well as the adders corresponding
to the final operations, that is, p = p + p and p + m if p is negative. The
c s
minimum clock period is about 2T , the number of clock cycles is equal
FA
to 2k, so that the total computation time, without the final operations, is
approximately equal to 4kT . If carry-propagate adders are used for the
FA
final operations the total computation time is approximately equal to
T ≈ 4kT + kT (3.10)
FA FA