Page 30 - Matrix Analysis & Applied Linear Algebra

P. 30

22 Chapter 1 Linear Equations

to look at digit d t+1 in x = .d 1 d 2 ··· d t d t+1 ···× 10 (making sure d 1 = 0) and
then set
.d 1 d 2 ··· d t × 10 if d t+1 < 5,

fl(x)= −t
([.d 1 d 2 ··· d t ]+10 ) × 10 if d t+1 ≥ 5.
For example, in 2 -digit, base-10 ﬂoating-point arithmetic,
fl (3/80) = fl(.0375) = fl(.375 × 10 −1 )= .38 × 10 −1 = .038.
By considering η =1/3 and ξ = 3 with t -digit base-10 arithmetic, it’s
easy to see that
fl(η + ξ) = fl(η)+ fl(ξ) and fl(ηξ) = fl(η)fl(ξ).

Furthermore, several familiar rules of real arithmetic do not hold for ﬂoating-
point arithmetic—associativity is one outstanding example. This, among other
reasons, makes the analysis of ﬂoating-point computation diﬃcult. It also means
that you must be careful when working the examples and exercises in this text
because although most calculators and computers can be instructed to display
varying numbers of digits, most have a ﬁxed internal precision with which all
calculations are made before numbers are displayed, and this internal precision
cannot be altered. Almost certainly, the internal precision of your calculator or
computer is greater than the precision called for by the examples and exercises
in this text. This means that each time you perform a t-digit calculation, you
should manually round the result to t signiﬁcant digits and reenter the rounded
number before proceeding to the next calculation. In other words, don’t “chain”
operations in your calculator or computer.
To understand how to execute Gaussian elimination using ﬂoating-point
arithmetic, let’s compare the use of exact arithmetic with the use of 3-digit
base-10 arithmetic to solve the following system:
47x +28y =19,
89x +53y =36.
Using Gaussian elimination with exact arithmetic, we multiply the ﬁrst equation
by the multiplier m =89/47 and subtract the result from the second equation
to produce

47 28 19
.
0 −1/47 1/47
Back substitution yields the exact solution

x = 1 and y = −1.
Using 3-digit arithmetic, the multiplier is

89 1
fl(m)= fl = .189 × 10 =1.89.
47

25 26 27 28 29 30 31 32 33 34 35