Page 226 - Introduction to Microcontrollers Architecture, Programming, and Interfacing of The Motorola 68HC12
P. 226
7.5 Floating-Point Arithmetic and Conversion 203
We now consider the essential elements of the proposed IEEE standard 32-bit
floating-point representation. The numbers represented are also called single precision
floating-point numbers, and we shall refer to them here simply as floating-point
numbers. The format is shown below.
In the drawing, s is the sign bit for the significand, and f represents the 23-bit
fractional part of the significand magnitude with the hidden bit, as above, to the left of
the binary point. The exponent is determined from e by a bias of 127, that is, an e of
127 represents an exponent of 0, an e of 129 represents an exponent of +2, an e of 120
represents an exponent of -7, and so on. The hidden bit is taken to be 1 unless e has the
value 0. The floating-point numbers given by
are called normalized, (In the IEEE standard, an e of 255 is used to represent ±_infmity
together with values that are not to be interpreted as numbers but are used to signal the
user that his calculation may no longer be valid.) The value of 0 for e is also used to
represent denormalized floating-point numbers, namely,
s e 126
(-l) *2 - *0.f fore = 0,f*0
Denormalized floating-point numbers allow the representation of small numbers with
magnitudes between 0 and 2~ ]26 . In particular, notice that the exponent for the
denormalized floating-point numbers is taken to be -126, rather than -127, so that the
126 23
interval between 0 and 2~ contains 2 -l uniformly spaced denormalized floating-
point numbers.
Although the format above might seem a little strange, it turns out to be convenient
because a comparison between normalized floating-point numbers is exactly the same as
a comparison between 32-bit signed-magnitude integers represented by the string s, e, f.
This means that a computer implementing signed-magnitude integer arithmetic will not
have to have a separate 32-bit compare for integers and floating-point numbers. In larger
machines with 32-bit words, this translates into a hardware savings, while in smaller
machines, like the 6812, it means that only one subroutine has to be written instead of
two if signed-magnitude arithmetic for integers is to be implemented.
We now look more closely at the ingredients that floating-point algorithms must
have for addition, subtraction, multiplication, and division. For simplicity, we focus our
attention on these operations when the inputs are normalized floating-point numbers and
the result is expressed as a normalized floating-point number.
To add or subtract two floating-point numbers, one of the representations has to be
adjusted so that the exponents are equal before the significands are added or subtracted.
For accuracy, this unnormalization always is done to the number with the smaller
exponent. For example, to add the two floating-point numbers