Page 231 - Introduction to Microcontrollers Architecture, Programming, and Interfacing of The Motorola 68HC12
P. 231
208 Chapter 7 Arithmetic Operations
The rounding process for addition of numbers with opposite signs (e.g., subtraction) is
exactly like that above except that the round byte must be included in the subtraction,
and renormalization may be necessary after the significands are subtracted. In this
renormalization step, several shifts left of the significand may be required where each
shift requires a bit b for the least significant bit of the significand. It may be obtained
from the round byte as shown below. (The sticky bit may also be replaced by zero in the
process pictured without altering the final result. However, at least one round bit is
required.) After renormalization, the rounding process is identical to (16). As an example,
20 * 1.1111 .. . 1
- 2-23 * 1.1110 . . . 0
becomes
2° * 1.0000 .. . 00
- 20 * 0.0000 . . . OirillOOOQO)
2° * 0.1111 . . . 10(00100000)
l
which, after renormalization and rounding, becomes 2~ * 1.1... 10 0. Subroutines
for floating-point addition and multiplication are given in Hiware's C and C++ libraries.
To illustrate the principles without an undue amount of detail, the subroutines are given
only for normalized floating-point numbers. Underflow is handled by flushing the result
to zero and setting an underflow flag, and overflow is handled by setting an overflow flag
and returning the largest possible magnitude with the correct sign. These subroutines
conform to the IEEE standard but illustrate the basic algorithms, including rounding. The
procedure for addition is summarized in Figure 7.20, where one should note that the
significands are added as signed-magnitude numbers.
One other issue with floating-point numbers is conversion. For example, how does
4
one convert the decimal floating-point number 3.45786* 10 into a binary floating-point
number with the IEEE format? One possibility is to have a table of binary floating-point
numbers, one for each power of ten in the range of interest. One can then compute the
expression
4 3 1
3 * 10 + 4 * 10 + . . . + 6 * 10-
using the floating-point add and floating-point multiply subroutines. One difficulty with
this approach is that accuracy is lost because of the number of floating point multiplies
and adds that are used. For example, for eight decimal digits in the decimal significand,
there are eight floating-point multiplies and seven floating-point adds used in the
4
conversion process. To get around this, one could write 3.45786 * 10 as .345786 * 10 5
5
and multiply the binary floating-point equivalent of 10 (obtained again from a table) by
the binary floating-point equivalent of .345786. This, of course, would take only one
floating-point multiply and a conversion of the decimal fraction to a binary floating-
point number.