Page 263 - ARM 64 Bit Assembly Language
P. 263

252 Chapter 8

                  Range:          Minimum value is 1000000000.000000 =−512
                                  Maximum value is 0111111111.111111 = 511.9921875
                                  Range is G = 511.9921875 + 512 = 1023.9921875
                  Dynamic range: For a signed fixed-point rational representation, S(i,f ), the dynamic range
                                  is
                                                               2 i
                                                                              P
                                                     D = 2 ×      = 2 i+f +1  = 2 .
                                                              2 −f
                                  Therefore, the dynamic range of an S(9,6) is 2 16  = 65536.


                  Being aware of these properties, the programmer can select fixed point representations that fit
                  the task that they are trying to solve. This allows the programmer to strive for very efficient
                  code by using the smallest fixed point representation possible, while still guaranteeing that the
                  results of computations will be within some limits for error tolerance.


                  8.4 Fixed point operations

                  Fixed point numbers are actually stored as integers, and all of the integer mathematical oper-
                  ations can be used. However, some care must be taken to track the radix point at each stage of
                  the computation. The advantages of fixed point calculations are that the operations are very
                  fast and can be performed on any computer, even if it does not have special hardware support
                  for non-integral numbers.



                  8.4.1 Fixed point addition and subtraction

                  Fixed point addition and subtraction work exactly like their integer counterparts. Fig. 8.1
                  gives some examples of fixed point addition with signed numbers. Note that in each case, the
                  numbers are aligned so that they have the same number of bits in their fractional part. This
                  requirement is the only difference between integer and fixed point addition. In fact, integer
                  arithmetic is just fixed point arithmetic with no bits in the fractional part. The arithmetic that
                  was covered in Chapter 7 was fixed point arithmetic using only S(i,0) and U(i,0) numbers.
                  In other words, integers and natural numbers can be considered as fixed point numbers where
                  the number of fractional bits, f , is zero. Now we are simply extending our knowledge to deal
                  with numbers where f  = 0. There are some rules which must be followed to ensure that the
                  results are correct. The rules for subtraction are the same as the rules for addition. Since we
                  are using two’s complement math, subtraction is performed using addition.
                  Suppose we want to add an S(7,8) number to an S(7,4) number. The radix points are at dif-
                  ferent locations, so we cannot simply add them. Instead, we must shift one of the numbers,
   258   259   260   261   262   263   264   265   266   267   268