Page 39 - Applied Numerical Methods Using MATLAB
P. 39

28    MATLAB USAGE AND COMPUTATIONAL ERRORS
           that case, it is not the computer, but yourself as the user or the programmer, who
           is to blame for the wrong result. In this context, we should always be careful not
           to let the computer produce a farfetched output. In this section we will see how
           the computer represents and stores the numbers. Then we think about the cause
           and the propagation effect of computational error in order not to be deceived by
           unintentional mistakes of the computer and, it is hoped, to be able to take some
           measures against them.

           1.2.1  IEEE 64-bit Floating-Point Number Representation
           MATLAB uses the IEEE 64-bit floating-point number system to represent all
           numbers. It has a word structure consisting of the sign bit, the exponent field,
           and the mantissa field as follows:

             63 62               52 51                                       0
             S  Exponent            Mantissa


           Each of these fields expresses S, E,and M of a number f in the way described
           below.

              ž Sign bit

                                          0  for positive numbers
                               S = b 63 =
                                          1  for negative numbers
              ž Exponent field (b 62 b 61 b 60 ··· b 52 ): adopting the excess 1023 code
                                            11
                 E = Exp − 1023 ={0, 1,..., 2 − 1 = 2047}− 1023
                   ={−1023, −1022,..., +1023, +1024}
                                                −1022
                      −1023 + 1       for |f | < 2  (Exp = 00000000000)
                   =   −1022 ∼+1023 for 2   −1022  ≤|f | < 2 1024 (normalized ranges)
                       +1024           for ±∞
                     
              ž Mantissa field (b 51 b 50 ... b 1 b 0 ):
              In the un-normalized range where the numbers are so small that they can be
           represented only with the value of hidden bit 0, the number represented by the
           mantissa is
                         M = 0.b 51 b 50 ··· b 1 b 0 = [b 51 b 50 ·· · b 1 b 0 ] × 2 −52  (1.2.1)

           You might think that the value of the hidden bit is added to the exponent, instead
           of to the mantissa.
              In the normalized range, the number represented by the mantissa together with
           the value of hidden bit b h = 1is
               M = 1.b 51 b 50 ·· · b 1 b 0 = 1 + [b 51 b 50 ··· b 1 b 0 ] × 2 −52
                  = 1 + b 51 × 2 −1  + b 50 × 2 −2  +· · · + b 1 × 2 −51  + b 0 × 2 −52
   34   35   36   37   38   39   40   41   42   43   44