Page 293 - ARM 64 Bit Assembly Language
P. 293

282 Chapter 8

                  32-bit Fixed Point Assembly The sine function is computed using the code shown in List-
                       ing 8.7.
                  32-bit Fixed Point C The sine function is computed using the code shown in Listing 8.6.
                  Single Precision C Sine is computed using the floating point sine function which is pro-
                       vided by the C compiler. The C code is written to use IEEE single precision floating
                       point numbers.
                  Double Precision C Same as the previous method, but using IEEE double precision instead
                       of single precision.

                  Each of the four implementations were compiled both with and without compiler opti-
                  mizations, resulting in a total of eight test cases. All cases were run on an NVIDIA Jetson
                  TX2.

                  From Table 8.4, it is clear that the (carefully written) fixed point implementation written in
                  Assembly beats the code generated by the compiler in every case. The closest that the com-
                  piler can get is the C version of the fixed-point algorithm when the compiler is run with full
                  optimization. Even in that case, the fixed point assembly implementation is almost 14% faster.
                  The fixed point assembly code is 98% faster than the single precision floating point imple-
                  mentation, and has 33% more precision (32 bits versus 24 bits). Note that even with floating
                  point hardware support, the fixed point assembly implementation is almost twice as fast as
                  the optimized floating point implementation provided by the compiler. For processors without
                  hardware floating point support, fixed point arithmetic can be twenty or more times faster than
                  floating point.
                  Similar results could be obtained on any processor architecture, and any reasonably complex
                  mathematical problem. When developing software for small systems, the developer must
                  weigh the costs and benefits of alternative implementations. For battery powered systems, it
                  is important to realize that choices of hardware and software can affect power consumption
                  even more strongly than computing performance. First, the power used by a system which in-
                  cludes a hardware floating point processor will be consistently higher than that of a system
                  without one. Second, the reduction in processing time required for the job is closely related to
                  the reduction in power required. Therefore, for battery operated systems, a fixed point imple-
                  mentation could greatly extend battery life. The following statements summarize the results
                  from the experiment in this section.

                  1. With some effort, a competent assembly programmer can beat the assembler, in some
                     cases by a very large margin.
                  2. If computational performance is critical, then a well-designed fixed point implementation
                     will usually outperform even a hardware-accelerated floating point implementation.
                  3. If there is no hardware support for floating point, then floating point performance is ex-
                     tremely poor, and fixed point will always provide the best performance.
   288   289   290   291   292   293   294   295   296   297   298