Page 76 - Hardware Implementation of Finite-Field Arithmetic

P. 76

mod m Reduction 59

64
n = 64 and m = 239: c = ⎣2 /239⎦ = 112358E75D30336 (hexadecimal),
192
384
64
192
64
n = 384 and m = 2 − 2 − 1: c = ⎣2 /(2 − 2 − 1)⎦ = 2 + 2 + 1,
64
192
256
96
256
512
n = 512 and m = 2 − 2 224 + 2 192 + 2 − 1: c = ⎣2 /(2 − 2 224 + 2 192
96
+ 2 − 1)⎦
= 100000000FFFFFFFFFFFFFFFEFFFFFFFEFFFFFFFEFFFFFFFF000
0000000000003 (hexadecimal)
Obviously, the multiplication by c is very simple in the second
case, and much more complex in the other ones.
2.7.6 Specific Circuits
The 16-bit to 8-bit mod 239 reducer of Fig. 2.10, and the 384-bit to 192-
192
bit mod (2 − 2 − 1) reducers of Figs 2.11 and 2.12 have been
64
implemented (Table 2.11).
N K M LUTs Slices Total time
16 8 239 63 37 17.1
384 192 2 192 − 2 − 1 1,033 931 30
64
64
384 192 2 192 − 2 − 1 648 642 45
TABLE 2.11 Cost and Delay of Combinational Specific Reducers

2.8 Comments and Conclusions
According to the implementation results, the following conclusions
are obtained.

1. For nonfixed-m reducers, the more cost-effective are those
based on integer divisions, that is, the nonrestoring and the
k
SRT reducers. The mod 2 − a reducer uses more resources
(slices and multipliers) and the same occurs with the pre-
computation-based reducer. Nevertheless, the latter is more
cost-effective. The less cost-effective is the Barrett reducer for
the great number of multipliers it needs.
2. As regards the computation time for nonfixed-m reducers,
k
the fastest is the Barrett reducer. The mod 2 − a and the pre-
computation-based reducers are slower, while the nonrestor-
ing and SRT reducers are the slowest. Those experimental
results do not completely confirm the theoretical results of
Tables 2.1 and 2.2. The SRT reducers are not so fast as was

71 72 73 74 75 76 77 78 79 80 81