Page 18 - Compact Numerical Methods For Computers

P. 18

8 Compact numerical methods for computers
memory, not in the working registers where extra digits may be carried. On a
Hewlett-Packard 9830, for instance, it was necessary when determining the
so-called ‘split precision’ to store numbers specifically in array elements to force
the appropriate truncation.
The above discussion has assumed a model of floating-point arithmetic which may
be termed an additive form in that powers of the radix are added together and the
entire sum multiplied by some power of the radix (the exponent) to provide the final
quantity representing the desired real number. This representation may or may not
be exact. For example, the fraction cannot be exactly represented in additive binary
(radix 2) floating-point arithmetic. While there are other models of floating-point
arithmetic, the additive form is the most common, and is used in the IEEE binary
and radix-free floating-point arithmetic standards. (The March, 1981, issue of IEEE
Computer magazine, volume 3, number 4, pages 51-86 contains a lucid description of
the binary standard and its motivations.)
If we are concerned with having absolute upper and lower bounds on computed
quantities, interval arithmetic is possible, but not commonly supported by program-
ming languages (e.g. Pascal SC (Kulisch 1987)). Despite the obvious importance of
assured bounds on results, the perceived costs of using interval arithmetic have
largely prevented its widespread use.
The development of standards for floating-point arithmetic has the great benefit
that results of similar calculations on different machinery should be the same.
Furthermore, manufacturers have been prompted to develop hardware implemen-
tations of these standards, notably the Intel 80 x 87 family and the Motorola 68881
of circuit devices. Hewlett-- Packard implemented a decimal version of the IEEE 858
standard in their HP 71B calculator.
Despite such developments, there continues to be much confusion and misinfor-
mation concerning floating-point arithmetic. Because an additive decimal form of
arithmetic can represent fractions such as exactly, and in general avoid input-
output conversion errors, developers of software products using such arithmetic
(usually in binary coded decimal or BCD form) have been known to claim that it has
'no round-off error', which is patently false. I personally prefer decimal arithmetic, in
that data entered into a calculation can generally be represented exactly, so that a
display of the stored raw data reproduces the input familiar to the user. Nevertheless,
the differences between good implementations of floating-point arithmetic, whether
binary or decimal, are rarely substantive.
While the subject of machine arithmetic is still warm, note that the mean of two
numbers may be calculated to be smaller or greater than either! An example in
four-figure decimal arithmetic will serve as an illustration of this.

Exact Rounded Truncated

a 5008 5008 5008
b 5007 5007 5007
a+b 10015 1002 * 10 1001 * 10
(a + b) /2 5007·5 501·0 * 10 500·5 * 10
= 5010 = 500.5

13 14 15 16 17 18 19 20 21 22 23