Page 41 - Applied Numerical Methods Using MATLAB

P. 41

30 MATLAB USAGE AND COMPUTATIONAL ERRORS
1
0
10
R 0 = [2 , 2 ) with Exp = 2 − 1 = 1023,E = Exp − 1023 = 0
E
S 011 . . . 1111 0000 0000 . . . . 0000 0000 (1 + 0) × 2 = (1 + 0) × 2 0 = 1
S 011 . . . 1111 0000 0000 . . . . 0000 0001 (1 + 2 −52 ) × 2 0
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
S 011 . . . 1111 1111 1111 . . . . 1111 1111 {(1 + (2 52 − 1) 2 −52 ) = (2 − 2 −52 )} × 2 0
Value of LSB: 0 = 2 −52
4. The Largest Normalized Range (with the value of hidden bit b h = 1)
11
R 1024 = [2 1023 , 2 1024 ) with Exp = 2 −2 = 2046,E = Exp−1023 = 1023
E
S 111 . . . 1110 0000 0000 . . . . 0000 0000 (1 + 0) × 2 = (1 + 0) × 2 1023
S 111 . . . 1110 0000 0000 . . . . 0000 0001 (1 + 2 −52 ) × 2 1023
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
S 111 . . . 1110 1111 1111 . . . . 1111 1111 {(1 + (2 52 − 1) 2 −52 ) = (2 − 2 −52 )} × 2 1023

Value of LSB: −1022 = 2 −1022−52 = 2 −1074
11
5. ±∞(inf) Exp = 2 − 1 = 2047, E = Exp − 1023 = 1024 (meaningless)
E
0 111 . . . 1111 0000 0000 . . . . 0000 0000 +∞ ≠ (1 + 0) × 2 = (1 + 0) × 2 1024
E
1 111 . . . 1111 0000 0000 . . . . 0000 0000 −∞ ≠ −(1 + 0) × 2 = −(1 + 0) × 2 1024
S 111 . . . 1111 0000 0000 . . . . 0000 0001 invalid (not used)
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
S 111 . . . 111 1111 1111 . . . . 1111 1111 invalid (not used)

From what has been mentioned earlier, we know that the minimum and max-
imum positive numbers are, respectively,

f min = (0 + 2 −52 ) × 2 −1022 = 2 −1074 = 4.9406564584124654 × 10 −324
f max = (2 − 2 −52 ) × 2 1023 = 1.7976931348623157 × 10 308

This can be checked by running the program “nm119_8.m” in Section 1.1.9.
Now, in order to gain some idea about the arithmetic computational mecha-
nism, let’s see how the addition of two numbers, 3 and 14, represented in the
IEEE 64-bit ﬂoating number system, is performed.

) )
)
)
1
)

)

36 37 38 39 40 41 42 43 44 45 46