Page 238 - Compact Numerical Methods For Computers
P. 238

226               Compact numerical methods for computers
                                  18.4. A COMPARISON OF FUNCTION MINIMISATION AND
                                           NONLINEAR LEAST-SQUARES METHODS
                            It is quite difficult to compare the four basic algorithms presented in the preceding
                            chapters in order to say that one of them is ‘best.’ The reason for this is that one
                            algorithm may solve some problems very easily but run out of space on another.
                            An algorithm which requires gradients may be harder to use than another which
                            needs only function values, hence requiring more human time and effort even
                            though it saves computer time. Alternatively, it may only be necessary to improve
                            an approximation to the minimum, so that a method which will continue until it
                            has exhausted all the means by which it may reduce the function value may very
                            well be unnecessarily persistent.
                              Despite all these types of question, I have run an extensive comparison of the
                            methods which have been presented. In this, each method was applied to 79 test
                            problems, all of which were specified in a sum-of-squares form
                                                                 T
                                                             S=f f.                          (18.27)
                            Gradients, where required, were computed via
                                                             g=-J f .                        (18.28)
                                                                 T
                            Nash (1976) presents 77 of these problems, some of which have been used as
                            examples in this text†. The convergence criteria in each algorithm were as
                            stringent as possible to allow execution to continue as long as the function was
                            being reduced. In some cases, this meant that the programs had to be stopped
                            manually when convergence was extremely slow. Some judgement was then used
                            to decide if a satisfactory solution had been found. In programs using numerical
                            approximation to the derivative, the same kind of rather subjective judgement
                            was used to decide whether a solution was acceptable based on the computed
                            gradient. This means that the results given below for minimisation methods
                            employing the approximation detailed in §18.2 have used a more liberal definition
                            of success than that which was applied to programs using analytic or no deriva-
                            tives. The question this raises is: do we judge a program by what it is intended to
                            do? Or do we test to see if it finds some globally correct answer? I have chosen
                            the former position, but the latter is equally valid. Such differences of viewpoint
                            unfortunately give rise to many disputes and controversies in this field.
                              Having now described what has been done, a measure of success is needed. For
                            reliability, we can compare the number of problems for which acceptable (rather
                            than correct) solutions have been found to the total number of problems run.
                            Note that this total is not necessarily 79 because some problems have an initial set
                            of parameters corresponding to a local maximum and the methods which use
                            gradient information will not generally be able to proceed from such points. Other
                            problems may be too large to fit in the machine used for the tests-a partition of a
                            Data General NOVA which uses 23-bit binary arithmetic. Also some of the
                            algorithms have intermediate steps which cause overflow or underflow to occur
                            because of the particular scale of the problems at hand.
                            † The two remaining functions are given by Osborne (1972).
   233   234   235   236   237   238   239   240   241   242   243