Page 41 - Applied Numerical Methods Using MATLAB
P. 41

30    MATLAB USAGE AND COMPUTATIONAL ERRORS
                          1
                       0
                                        10
                R 0 = [2 , 2 ) with Exp = 2 − 1 = 1023,E = Exp − 1023 = 0
                                                        E
                  S 011 . . .  1111  0000 0000  . . . .  0000 0000 (1 + 0) × 2  = (1 + 0) × 2 0  = 1
                  S 011 . . .  1111  0000 0000  . . . .  0000 0001  (1 + 2 −52 ) × 2 0
                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
                  S 011 . . .  1111  1111 1111  . . . .  1111 1111  {(1 + (2 52  − 1) 2 −52 ) = (2 − 2 −52 )} × 2 0
                Value of LSB:   0 = 2 −52
              4. The Largest Normalized Range (with the value of hidden bit b h = 1)
                                              11
                R 1024 = [2 1023 , 2 1024 ) with Exp = 2 −2 = 2046,E = Exp−1023 = 1023
                                                       E
                 S 111 . . .  1110  0000 0000  . . . .  0000 0000 (1 + 0) × 2  = (1 + 0) × 2 1023
                 S 111 . . .  1110  0000 0000  . . . .  0000 0001  (1 + 2 −52 ) × 2 1023
                  . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
                 S 111 . . .  1110  1111 1111  . . . .  1111 1111  {(1 + (2 52  − 1) 2 −52 ) = (2 − 2 −52 )} × 2 1023

                Value of LSB:   −1022 = 2 −1022−52  = 2 −1074
                               11
              5. ±∞(inf) Exp = 2 − 1 = 2047, E = Exp − 1023 = 1024 (meaningless)
                                                           E
                   0 111 . . .  1111  0000 0000  . . . .  0000 0000 +∞ ≠ (1 + 0) × 2  = (1 + 0) × 2 1024
                                                            E
                   1 111 . . .  1111  0000 0000  . . . .  0000 0000  −∞ ≠ −(1 + 0) × 2  = −(1 + 0) × 2 1024
                   S 111 . . .  1111  0000 0000  . . . .  0000 0001  invalid (not used)
                   . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
                   S 111 . . .  111  1111 1111  . . . .  1111 1111  invalid (not used)

              From what has been mentioned earlier, we know that the minimum and max-
           imum positive numbers are, respectively,

                f min = (0 + 2 −52 ) × 2 −1022  = 2 −1074  = 4.9406564584124654 × 10 −324
               f max = (2 − 2 −52 ) × 2 1023  = 1.7976931348623157 × 10 308

           This can be checked by running the program “nm119_8.m” in Section 1.1.9.
              Now, in order to gain some idea about the arithmetic computational mecha-
           nism, let’s see how the addition of two numbers, 3 and 14, represented in the
           IEEE 64-bit floating number system, is performed.


                                                      )        )
                                                               )
                                                               )
                                                                1
                 )

                 )
   36   37   38   39   40   41   42   43   44   45   46