Page 298 - ARM 64 Bit Assembly Language
P. 298

Non-integral mathematics 287

                     8.7.4 IEEE 754 quad-precision

                     The IEEE 754 Quad-Precision format was designed to provide enough range and precision
                     for very demanding applications. It provides a 14-bit exponent and a 116-bit mantissa. This
                     format is still not supported by most hardware. The IBM POWER9 CPU fully supports quad
                     precision in hardware. Some other processors, such as SPARC V8 and V9, and PA-RISC, of-
                     fer partial support. However for mid-range processors such as the Intel x86 family and the
                     ARM, this format is still definitely out of their league. It may be supported by some compil-
                     ers, but the operations are implemented in software, and can take ten times as long (or more)
                     as a hardware implementation.










                     8.8 Floating point operations


                     Many processors do not have hardware support for floating point. On those processors, all
                     floating point must be accomplished through software. Processors that do support floating
                     point in hardware must have quite sophisticated circuitry to manage the basic operations on
                     data in the IEEE 754 standard formats. Regardless of whether the operations are carried out in
                     software or hardware, the basic arithmetic operations require multiple steps.


                     8.8.1 Floating point addition and subtraction

                     The steps required for addition and subtraction of floating point numbers is the same, regard-
                     less of the specific format. The steps for adding or subtracting to floating point numbers a and
                     b are as follows:
                     1. Extract the exponents E a and E b .
                     2. Extract the significands M a and M b , and convert them into 2’s complement numbers, us-
                         ing the signs S a and S b .
                     3. Shift the significand with the smaller exponent right by |E a − E b |.
                     4. Perform addition (or subtraction) on the significands to get the significand of the result,
                         M r . Remember that the result may require one more significant bit to avoid overflow.
                     5. If M r is negative, then take the 2’s complement and set S r to 1. Otherwise set S r to 0.
                     6. Shift M r until the leftmost 1 is in the “hidden” bit position, and add the shift amount to
                         the smaller of the two exponents to form the new exponent E r .
   293   294   295   296   297   298   299   300   301   302   303