Page 293 - ARM 64 Bit Assembly Language
P. 293
282 Chapter 8
32-bit Fixed Point Assembly The sine function is computed using the code shown in List-
ing 8.7.
32-bit Fixed Point C The sine function is computed using the code shown in Listing 8.6.
Single Precision C Sine is computed using the floating point sine function which is pro-
vided by the C compiler. The C code is written to use IEEE single precision floating
point numbers.
Double Precision C Same as the previous method, but using IEEE double precision instead
of single precision.
Each of the four implementations were compiled both with and without compiler opti-
mizations, resulting in a total of eight test cases. All cases were run on an NVIDIA Jetson
TX2.
From Table 8.4, it is clear that the (carefully written) fixed point implementation written in
Assembly beats the code generated by the compiler in every case. The closest that the com-
piler can get is the C version of the fixed-point algorithm when the compiler is run with full
optimization. Even in that case, the fixed point assembly implementation is almost 14% faster.
The fixed point assembly code is 98% faster than the single precision floating point imple-
mentation, and has 33% more precision (32 bits versus 24 bits). Note that even with floating
point hardware support, the fixed point assembly implementation is almost twice as fast as
the optimized floating point implementation provided by the compiler. For processors without
hardware floating point support, fixed point arithmetic can be twenty or more times faster than
floating point.
Similar results could be obtained on any processor architecture, and any reasonably complex
mathematical problem. When developing software for small systems, the developer must
weigh the costs and benefits of alternative implementations. For battery powered systems, it
is important to realize that choices of hardware and software can affect power consumption
even more strongly than computing performance. First, the power used by a system which in-
cludes a hardware floating point processor will be consistently higher than that of a system
without one. Second, the reduction in processing time required for the job is closely related to
the reduction in power required. Therefore, for battery operated systems, a fixed point imple-
mentation could greatly extend battery life. The following statements summarize the results
from the experiment in this section.
1. With some effort, a competent assembly programmer can beat the assembler, in some
cases by a very large margin.
2. If computational performance is critical, then a well-designed fixed point implementation
will usually outperform even a hardware-accelerated floating point implementation.
3. If there is no hardware support for floating point, then floating point performance is ex-
tremely poor, and fixed point will always provide the best performance.