Page 293 - ARM 64 Bit Assembly Language

P. 293

282 Chapter 8

32-bit Fixed Point Assembly The sine function is computed using the code shown in List-
ing 8.7.
32-bit Fixed Point C The sine function is computed using the code shown in Listing 8.6.
Single Precision C Sine is computed using the ﬂoating point sine function which is pro-
vided by the C compiler. The C code is written to use IEEE single precision ﬂoating
point numbers.
Double Precision C Same as the previous method, but using IEEE double precision instead
of single precision.

Each of the four implementations were compiled both with and without compiler opti-
mizations, resulting in a total of eight test cases. All cases were run on an NVIDIA Jetson
TX2.

From Table 8.4, it is clear that the (carefully written) ﬁxed point implementation written in
Assembly beats the code generated by the compiler in every case. The closest that the com-
piler can get is the C version of the ﬁxed-point algorithm when the compiler is run with full
optimization. Even in that case, the ﬁxed point assembly implementation is almost 14% faster.
The ﬁxed point assembly code is 98% faster than the single precision ﬂoating point imple-
mentation, and has 33% more precision (32 bits versus 24 bits). Note that even with ﬂoating
point hardware support, the ﬁxed point assembly implementation is almost twice as fast as
the optimized ﬂoating point implementation provided by the compiler. For processors without
hardware ﬂoating point support, ﬁxed point arithmetic can be twenty or more times faster than
ﬂoating point.
Similar results could be obtained on any processor architecture, and any reasonably complex
mathematical problem. When developing software for small systems, the developer must
weigh the costs and beneﬁts of alternative implementations. For battery powered systems, it
is important to realize that choices of hardware and software can affect power consumption
even more strongly than computing performance. First, the power used by a system which in-
cludes a hardware ﬂoating point processor will be consistently higher than that of a system
without one. Second, the reduction in processing time required for the job is closely related to
the reduction in power required. Therefore, for battery operated systems, a ﬁxed point imple-
mentation could greatly extend battery life. The following statements summarize the results
from the experiment in this section.

1. With some effort, a competent assembly programmer can beat the assembler, in some
cases by a very large margin.
2. If computational performance is critical, then a well-designed ﬁxed point implementation
will usually outperform even a hardware-accelerated ﬂoating point implementation.
3. If there is no hardware support for ﬂoating point, then ﬂoating point performance is ex-
tremely poor, and ﬁxed point will always provide the best performance.

288 289 290 291 292 293 294 295 296 297 298