Page 263 - ARM 64 Bit Assembly Language
P. 263
252 Chapter 8
Range: Minimum value is 1000000000.000000 =−512
Maximum value is 0111111111.111111 = 511.9921875
Range is G = 511.9921875 + 512 = 1023.9921875
Dynamic range: For a signed fixed-point rational representation, S(i,f ), the dynamic range
is
2 i
P
D = 2 × = 2 i+f +1 = 2 .
2 −f
Therefore, the dynamic range of an S(9,6) is 2 16 = 65536.
Being aware of these properties, the programmer can select fixed point representations that fit
the task that they are trying to solve. This allows the programmer to strive for very efficient
code by using the smallest fixed point representation possible, while still guaranteeing that the
results of computations will be within some limits for error tolerance.
8.4 Fixed point operations
Fixed point numbers are actually stored as integers, and all of the integer mathematical oper-
ations can be used. However, some care must be taken to track the radix point at each stage of
the computation. The advantages of fixed point calculations are that the operations are very
fast and can be performed on any computer, even if it does not have special hardware support
for non-integral numbers.
8.4.1 Fixed point addition and subtraction
Fixed point addition and subtraction work exactly like their integer counterparts. Fig. 8.1
gives some examples of fixed point addition with signed numbers. Note that in each case, the
numbers are aligned so that they have the same number of bits in their fractional part. This
requirement is the only difference between integer and fixed point addition. In fact, integer
arithmetic is just fixed point arithmetic with no bits in the fractional part. The arithmetic that
was covered in Chapter 7 was fixed point arithmetic using only S(i,0) and U(i,0) numbers.
In other words, integers and natural numbers can be considered as fixed point numbers where
the number of fractional bits, f , is zero. Now we are simply extending our knowledge to deal
with numbers where f = 0. There are some rules which must be followed to ensure that the
results are correct. The rules for subtraction are the same as the rules for addition. Since we
are using two’s complement math, subtraction is performed using addition.
Suppose we want to add an S(7,8) number to an S(7,4) number. The radix points are at dif-
ferent locations, so we cannot simply add them. Instead, we must shift one of the numbers,