Page 226 - Introduction to Microcontrollers Architecture, Programming, and Interfacing of The Motorola 68HC12
P. 226

7.5 Floating-Point Arithmetic and Conversion                         203

            We now consider the essential elements of the proposed IEEE standard 32-bit
        floating-point representation. The numbers represented are also called single precision
        floating-point numbers, and we shall refer to them here simply as floating-point
        numbers. The format is shown below.







            In the drawing, s is the sign bit for the significand, and f represents the 23-bit
        fractional part of the significand magnitude with the hidden bit, as above, to the left of
        the binary point. The exponent is determined from e by a bias of 127, that is, an e of
         127 represents an exponent of 0, an e of 129 represents an exponent of +2, an e of 120
        represents an exponent of -7, and so on. The hidden bit is taken to be 1 unless e has the
        value 0. The floating-point numbers given by






        are called normalized, (In the IEEE standard, an e of 255 is used to represent ±_infmity
        together with values that are not to be interpreted as numbers but are used to signal the
        user that his calculation may no longer be valid.) The value of 0 for e is also used to
        represent denormalized floating-point numbers, namely,

                                 s  e 126
                              (-l) *2 -  *0.f   fore = 0,f*0
        Denormalized floating-point numbers allow the representation of small numbers with
        magnitudes between 0 and 2~  ]26 . In particular, notice that the exponent for the
        denormalized floating-point numbers is taken to be -126, rather than -127, so that the
                              126         23
        interval between 0 and 2~  contains 2 -l uniformly spaced denormalized floating-
        point numbers.
            Although the format above might seem a little strange, it turns out to be convenient
        because a comparison between normalized floating-point numbers is exactly the same as
        a comparison between 32-bit signed-magnitude integers represented by the string s, e, f.
        This means that a computer implementing signed-magnitude integer arithmetic will not
        have to have a separate 32-bit compare for integers and floating-point numbers. In larger
        machines with 32-bit words, this translates into a hardware savings, while in smaller
        machines, like the 6812, it means that only one subroutine has to be written instead of
        two if signed-magnitude arithmetic for integers is to be implemented.
            We now look more closely at the ingredients that floating-point algorithms must
        have for addition, subtraction, multiplication, and division. For simplicity, we focus our
        attention on these operations when the inputs are normalized floating-point numbers and
        the result is expressed as a normalized floating-point number.
            To add or subtract two floating-point numbers, one of the representations has to be
        adjusted so that the exponents are equal before the significands are added or subtracted.
        For accuracy, this unnormalization always is done to the number with the smaller
        exponent. For example, to add the two floating-point numbers
   221   222   223   224   225   226   227   228   229   230   231