Page 262 - ARM 64 Bit Assembly Language

P. 262

Non-integral mathematics 251

8.3.2 Q notation

Fixed point number formats can also be represented using Q notation, which was developed
by Texas Instruments. Q notation is equivalent to the S/U format used in this book, except that
the integer portion is not always fully speciﬁed. In general, Q formats are speciﬁed as Qm,n
where m is the number of integer bits, and n is the number of fractional bits. If a ﬁxed word
size w is being used then m may be omitted, and is assumed to be w − n. For example, a Q10
number has ten fractional bits, and the number of integer bits is not speciﬁed, but is assumed
to be the number of bits required to complete a word of data. A Q2,4 number has two integer
bits and four fractional bits in a six bit word. There are two conﬂicting conventions for deal-
ing with the sign bit. In one convention, the sign bit is included as part of m, and in the other
convention, it is not. When using Q notation, it is important to state which convention is being
used. Additionally, a U may be preﬁxed to indicate an unsigned value. For example UQ8.8 is
equivalent to U(8,8), and Q7,9 is equivalent to S(7,9).

8.3.3 Properties of ﬁxed point numbers

Once the decision has been made to used ﬁxed point calculations, the programmer must make
some decisions about the speciﬁc representation of each ﬁxed point variable. The combination
of size and radix will affect several properties of the numbers, including:

Precision: the maximum number of non-zero bits representable,
Resolution: the smallest non-zero magnitude representable,
Accuracy: the magnitude of the maximum difference between a true real value and it’s
approximate representation,
Range: the difference between the largest and smallest number that can be repre-
sented, and
Dynamic range: the ratio of the maximum absolute value, and the minimum positive abso-
lute value representable.

Given a number speciﬁed using the notation introduced previously, we can determine its prop-
erties. For example, an S(9,6) number has the following properties:

Precision: P = 16 bits
Resolution: R = 2 −6 = 0.015625
Accuracy: A = R = 0.0078125
2

257 258 259 260 261 262 263 264 265 266 267