Page 74 - Applied Numerical Methods Using MATLAB

P. 74

PROBLEMS 63
This can be conﬁrmed by typing the following statement into MATLAB
command window.
>>fprintf(’3 = %bx\n’,3) or >>format hex, 3, format short

which will print out onto the screen

0000000000000840 4008000000000000
Noting that more signiﬁcant byte (8[bits] = 2[hexadecimal digits]) of a
number is stored in the memory of higher address number in the INTEL
system, we can reverse the order of the bytes in this number to see the
number having the most/least signiﬁcant byte on the left/right side as we
can see in the daily life.

00 00 00 00 00 00 08 40 → 40 08 00 00 00 00 00 00

This is exactly the hexadecimal representation of the number 3 as we
expected. You can ﬁnd the IEEE 64-bit ﬂoating-point number represen-
tation of the number 14 and use the command fprintf() or format hex to
check if the result is right.

−1 3 −1 3
<procedure of adding 2 to 2 > <procedure of subtracting 2 from 2 >
1 .0000 × 2 3 1 .00000 × 2 3 1 .0000 × 2 3 1 .00000 × 2 3 2’s 1 .00000 × 2 3
+ 1 .0000 × 2 −1 alignment + 0 .00010 × 2 3 − 1 .0000 × 2 −1 alignment − 0 .00010 × 2 3 complement + 1 .11110 × 2 3
3 3
1 .00010 × 2 normalization 0 .11110 × 2
truncation of guard bit truncation of guard bit
1 .0001 × 2 3 1 .1110 × 2 2
−3
−4
= (1 + 2 ) × 2 3 = (1 + 1 − 2 ) × 2 2
right result right result
−2
3
3
−2
<procedure of adding 2 to 2 > <procedure of subtracting 2 from 2 >
1 .0000 × 2 3 1 .00000 × 2 3 1 .0000 × 2 3 1 .00000 × 2 3 2’s 1 .00000 × 2 3
alignment alignment complement
+ 1 .0000 × 2 −2 + 0 .00001 × 2 3 − 1 .0000 × 2 −2 − 0 .00001 × 2 3 + 1 .11111 × 2 3
1 .00001 × 2 3 normalization 0 .11111 × 2 3
truncation of guard bit
1 .0000 × 2 3 truncation of guard bit 1 .1111 × 2 2
−4
= (1 + 0) × 2 3 = (1 + 1 − 2 ) × 2 2
no difference right result
(cf) : hidden bit, : guard bit
Figure P1.18 Procedure of addition/subtraction with four mantissa bits.
1.18 Resolution of Number Representation and Quantization Error
In Section 1.2.1, we have seen that adding 2 −22 to 2 30 makes some dif-
ference, while adding 2 −23 to 2 30 makes no difference due to the bit shift
by over 52 bits for alignment before addition. How about subtracting 2 −23
30
30
from 2 ? In contrast with the addition of 2 −23 to 2 , it makes a differ-
ence as you can see by typing the following statement into the MATLAB

69 70 71 72 73 74 75 76 77 78 79