Page 18 - Compact Numerical Methods For Computers
P. 18

8                Compact numerical methods for computers
                            memory, not in the working registers where extra digits may be carried. On a
                            Hewlett-Packard 9830, for instance, it was necessary when determining the
                            so-called ‘split precision’ to store numbers specifically in array elements to force
                            the appropriate truncation.
                              The above discussion has assumed a model of floating-point arithmetic which may
                            be termed an additive form in that powers of the radix are added together and the
                            entire sum multiplied by some power of the radix (the exponent) to provide the final
                            quantity representing the desired real number. This representation may or may not
                            be exact. For example, the fraction cannot be exactly represented in additive binary
                            (radix 2) floating-point arithmetic. While there are other models of floating-point
                            arithmetic, the additive form is the most common, and is used in the IEEE binary
                            and radix-free floating-point arithmetic standards. (The March, 1981, issue of IEEE
                            Computer magazine, volume 3, number 4, pages 51-86 contains a lucid description of
                            the binary standard and its motivations.)
                              If we are concerned with having absolute upper and lower bounds on computed
                            quantities, interval arithmetic is possible, but not commonly supported by program-
                            ming languages (e.g. Pascal SC (Kulisch 1987)). Despite the obvious importance of
                           assured bounds on results, the perceived costs of using interval arithmetic have
                            largely prevented its widespread use.
                              The development of standards for floating-point arithmetic has the great benefit
                            that results of similar calculations on different machinery should be the same.
                            Furthermore, manufacturers have been prompted to develop hardware implemen-
                            tations of these standards, notably the Intel 80 x 87 family and the Motorola 68881
                            of circuit devices. Hewlett-- Packard implemented a decimal version of the IEEE 858
                            standard in their HP 71B calculator.
                              Despite such developments, there continues to be much confusion and misinfor-
                            mation concerning floating-point arithmetic. Because an additive decimal form of
                           arithmetic can represent fractions such as exactly, and in general avoid input-
                           output conversion errors, developers of software products using such arithmetic
                            (usually in binary coded decimal or BCD form) have been known to claim that it has
                            'no round-off error', which is patently false. I personally prefer decimal arithmetic, in
                            that data entered into a calculation can generally be represented exactly, so that a
                           display of the stored raw data reproduces the input familiar to the user. Nevertheless,
                            the differences between good implementations of floating-point arithmetic, whether
                            binary or decimal, are rarely substantive.
                              While the subject of machine arithmetic is still warm, note that the mean of two
                            numbers may be calculated to be smaller or greater than either! An example in
                            four-figure decimal arithmetic will serve as an illustration of this.

                                            Exact           Rounded               Truncated

                            a               5008            5008                  5008
                            b               5007            5007                  5007
                           a+b             10015            1002 * 10             1001 * 10
                           (a + b) /2       5007·5           501·0 * 10            500·5 * 10
                                                          = 5010                = 500.5
   13   14   15   16   17   18   19   20   21   22   23