Page 330 - ARM 64 Bit Assembly Language
P. 330

Floating point 319

                       Name     Page   Operation
                       fsub     309    Subtract
                       ldnp     301    Load Non-Temporal Pair
                       ldp      301    Load Pair
                       scvtf    306    Convert Signed Fixed Point to Float
                       scvtf    305    Convert Signed Integer to Float Using FPCR Rounding Mode
                       stnp     301    Store Non-Temporal Pair
                       stp      301    Store Pair
                       ucvtf    306    Convert Unsigned Fixed Point to Float
                       ucvtf    305    Convert Unsigned Integer to Float Using FPCR rounding mode


                     9.10 Chapter summary


                     The AArch64 FP/NEON coprocessor adds a great deal of power to the ARM architecture.
                     The FP/NEON register set can hold over twice the amount of data that can be held in the
                     AArch64 integer registers. The additional instructions allow the programmer to deal directly
                     with the most common IEEE 754 formats for floating point numbers. In the next chapter,
                     we will see that the ability to treat groups of registers as vectors adds a significant perfor-
                     mance improvement. The GCC compiler does not make good use of these advanced fea-
                     tures, which gives the assembly programmer a big advantage when high-performance code
                     is needed.


                     Exercises

                      9.1. How many registers does the FP/NEON coprocessor add to the AArch64 architecture?
                      9.2. What is the purpose of the AHP, DN, and FZ, RMODE, and FZ16 bits in the FPCR?
                      9.3. How are floating point parameters passed to subroutines? How is a pointer to a floating
                           point value (or array of values) passed to a subroutine?
                      9.4. Write the following C code in AArch64 assembly:

                          1   for (x = 0.0; x != 10.0; x += 0.1)
                          2     {
                          3       .
                          4       .
                          5       .
                          6     }
                      9.5. In the previous exercise, the C code contains a subtle bug.
                           •   What is the bug?
   325   326   327   328   329   330   331   332   333   334   335