Page 262 - ARM 64 Bit Assembly Language
P. 262
Non-integral mathematics 251
8.3.2 Q notation
Fixed point number formats can also be represented using Q notation, which was developed
by Texas Instruments. Q notation is equivalent to the S/U format used in this book, except that
the integer portion is not always fully specified. In general, Q formats are specified as Qm,n
where m is the number of integer bits, and n is the number of fractional bits. If a fixed word
size w is being used then m may be omitted, and is assumed to be w − n. For example, a Q10
number has ten fractional bits, and the number of integer bits is not specified, but is assumed
to be the number of bits required to complete a word of data. A Q2,4 number has two integer
bits and four fractional bits in a six bit word. There are two conflicting conventions for deal-
ing with the sign bit. In one convention, the sign bit is included as part of m, and in the other
convention, it is not. When using Q notation, it is important to state which convention is being
used. Additionally, a U may be prefixed to indicate an unsigned value. For example UQ8.8 is
equivalent to U(8,8), and Q7,9 is equivalent to S(7,9).
8.3.3 Properties of fixed point numbers
Once the decision has been made to used fixed point calculations, the programmer must make
some decisions about the specific representation of each fixed point variable. The combination
of size and radix will affect several properties of the numbers, including:
Precision: the maximum number of non-zero bits representable,
Resolution: the smallest non-zero magnitude representable,
Accuracy: the magnitude of the maximum difference between a true real value and it’s
approximate representation,
Range: the difference between the largest and smallest number that can be repre-
sented, and
Dynamic range: the ratio of the maximum absolute value, and the minimum positive abso-
lute value representable.
Given a number specified using the notation introduced previously, we can determine its prop-
erties. For example, an S(9,6) number has the following properties:
Precision: P = 16 bits
Resolution: R = 2 −6 = 0.015625
Accuracy: A = R = 0.0078125
2