Page 263 - ARM 64 Bit Assembly Language

P. 263

252 Chapter 8

Range: Minimum value is 1000000000.000000 =−512
Maximum value is 0111111111.111111 = 511.9921875
Range is G = 511.9921875 + 512 = 1023.9921875
Dynamic range: For a signed ﬁxed-point rational representation, S(i,f ), the dynamic range
is
2 i
P
D = 2 × = 2 i+f +1 = 2 .
2 −f
Therefore, the dynamic range of an S(9,6) is 2 16 = 65536.

Being aware of these properties, the programmer can select ﬁxed point representations that ﬁt
the task that they are trying to solve. This allows the programmer to strive for very efﬁcient
code by using the smallest ﬁxed point representation possible, while still guaranteeing that the
results of computations will be within some limits for error tolerance.

8.4 Fixed point operations

Fixed point numbers are actually stored as integers, and all of the integer mathematical oper-
ations can be used. However, some care must be taken to track the radix point at each stage of
the computation. The advantages of ﬁxed point calculations are that the operations are very
fast and can be performed on any computer, even if it does not have special hardware support
for non-integral numbers.

8.4.1 Fixed point addition and subtraction

Fixed point addition and subtraction work exactly like their integer counterparts. Fig. 8.1
gives some examples of ﬁxed point addition with signed numbers. Note that in each case, the
numbers are aligned so that they have the same number of bits in their fractional part. This
requirement is the only difference between integer and ﬁxed point addition. In fact, integer
arithmetic is just ﬁxed point arithmetic with no bits in the fractional part. The arithmetic that
was covered in Chapter 7 was ﬁxed point arithmetic using only S(i,0) and U(i,0) numbers.
In other words, integers and natural numbers can be considered as ﬁxed point numbers where
the number of fractional bits, f , is zero. Now we are simply extending our knowledge to deal
with numbers where f = 0. There are some rules which must be followed to ensure that the
results are correct. The rules for subtraction are the same as the rules for addition. Since we
are using two’s complement math, subtraction is performed using addition.
Suppose we want to add an S(7,8) number to an S(7,4) number. The radix points are at dif-
ferent locations, so we cannot simply add them. Instead, we must shift one of the numbers,

258 259 260 261 262 263 264 265 266 267 268