Page 262 - ARM 64 Bit Assembly Language
P. 262

Non-integral mathematics 251

                     8.3.2 Q notation

                     Fixed point number formats can also be represented using Q notation, which was developed
                     by Texas Instruments. Q notation is equivalent to the S/U format used in this book, except that
                     the integer portion is not always fully specified. In general, Q formats are specified as Qm,n
                     where m is the number of integer bits, and n is the number of fractional bits. If a fixed word
                     size w is being used then m may be omitted, and is assumed to be w − n. For example, a Q10
                     number has ten fractional bits, and the number of integer bits is not specified, but is assumed
                     to be the number of bits required to complete a word of data. A Q2,4 number has two integer
                     bits and four fractional bits in a six bit word. There are two conflicting conventions for deal-
                     ing with the sign bit. In one convention, the sign bit is included as part of m, and in the other
                     convention, it is not. When using Q notation, it is important to state which convention is being
                     used. Additionally, a U may be prefixed to indicate an unsigned value. For example UQ8.8 is
                     equivalent to U(8,8), and Q7,9 is equivalent to S(7,9).


                     8.3.3 Properties of fixed point numbers

                     Once the decision has been made to used fixed point calculations, the programmer must make
                     some decisions about the specific representation of each fixed point variable. The combination
                     of size and radix will affect several properties of the numbers, including:


                     Precision:      the maximum number of non-zero bits representable,
                     Resolution:     the smallest non-zero magnitude representable,
                     Accuracy:       the magnitude of the maximum difference between a true real value and it’s
                                     approximate representation,
                     Range:          the difference between the largest and smallest number that can be repre-
                                     sented, and
                     Dynamic range: the ratio of the maximum absolute value, and the minimum positive abso-
                                     lute value representable.


                     Given a number specified using the notation introduced previously, we can determine its prop-
                     erties. For example, an S(9,6) number has the following properties:


                     Precision:      P = 16 bits
                     Resolution:     R = 2 −6  = 0.015625
                     Accuracy:       A =  R  = 0.0078125
                                          2
   257   258   259   260   261   262   263   264   265   266   267