Page 302 - ARM 64 Bit Assembly Language
P. 302

Non-integral mathematics 291

                     the radix point throughout the computation. Floating point representations allow the radix
                     point to be tracked automatically, but require much more complex software and/or hardware.
                     Fixed point will usually provide better performance than floating point, but requires more pro-
                     gramming skill.

                     Fractional numbers in radix notation may not terminate in all bases. Numbers which terminate
                     in base two will also terminate in base ten, but the converse is not true. Programmers should
                     avoid counting using fractions which do not terminate in base two, because it leads to the ac-
                     cumulation of round-off errors.



                     Exercises
                      8.1. Perform the following base conversions:
                            a. Convert 10110.001 2 to base ten.
                            b. Convert 11000.0101 2 to base ten.
                            c. Convert 10.125 10 to binary.
                      8.2. Complete the following table (assume all values represent positive fixed-point num-
                           bers):

                                 Base 10             Base 2            Base 16             Base 13
                                  49.125
                                                   101011.011

                                                                         AF.3
                                                                                             12

                      8.3. You are working on a problem involving real numbers between −2 and 2, on a com-
                           puter that has 16-bit integer registers and has no hardware floating point support. You
                           decide to use 16-bit fixed point arithmetic.
                            a. What fixed point format should you use?
                            b. Draw a diagram showing the sign, if any, radix point, integer part, and fractional
                               part.
                            c. What is the precision, resolution, accuracy, and range of your format?
                      8.4. What is the resulting type of each of the following fixed point operations?
                            a. S(24,7) × S(27,15)
                            b. S(3,4) ÷ U(4,20)
                      8.5. Convert 26.640625 10 to a binary U(18,14) representation. Show the AArch64 assem-
                           bly code necessary to load that value into register r4.
                      8.6. For each of the following fractions, indicate whether or not it will terminate in bases 2,
                           5, 7, and 10.
   297   298   299   300   301   302   303   304   305   306   307