Page 238 - Compact Numerical Methods For Computers
P. 238
226 Compact numerical methods for computers
18.4. A COMPARISON OF FUNCTION MINIMISATION AND
NONLINEAR LEAST-SQUARES METHODS
It is quite difficult to compare the four basic algorithms presented in the preceding
chapters in order to say that one of them is ‘best.’ The reason for this is that one
algorithm may solve some problems very easily but run out of space on another.
An algorithm which requires gradients may be harder to use than another which
needs only function values, hence requiring more human time and effort even
though it saves computer time. Alternatively, it may only be necessary to improve
an approximation to the minimum, so that a method which will continue until it
has exhausted all the means by which it may reduce the function value may very
well be unnecessarily persistent.
Despite all these types of question, I have run an extensive comparison of the
methods which have been presented. In this, each method was applied to 79 test
problems, all of which were specified in a sum-of-squares form
T
S=f f. (18.27)
Gradients, where required, were computed via
g=-J f . (18.28)
T
Nash (1976) presents 77 of these problems, some of which have been used as
examples in this text†. The convergence criteria in each algorithm were as
stringent as possible to allow execution to continue as long as the function was
being reduced. In some cases, this meant that the programs had to be stopped
manually when convergence was extremely slow. Some judgement was then used
to decide if a satisfactory solution had been found. In programs using numerical
approximation to the derivative, the same kind of rather subjective judgement
was used to decide whether a solution was acceptable based on the computed
gradient. This means that the results given below for minimisation methods
employing the approximation detailed in §18.2 have used a more liberal definition
of success than that which was applied to programs using analytic or no deriva-
tives. The question this raises is: do we judge a program by what it is intended to
do? Or do we test to see if it finds some globally correct answer? I have chosen
the former position, but the latter is equally valid. Such differences of viewpoint
unfortunately give rise to many disputes and controversies in this field.
Having now described what has been done, a measure of success is needed. For
reliability, we can compare the number of problems for which acceptable (rather
than correct) solutions have been found to the total number of problems run.
Note that this total is not necessarily 79 because some problems have an initial set
of parameters corresponding to a local maximum and the methods which use
gradient information will not generally be able to proceed from such points. Other
problems may be too large to fit in the machine used for the tests-a partition of a
Data General NOVA which uses 23-bit binary arithmetic. Also some of the
algorithms have intermediate steps which cause overflow or underflow to occur
because of the particular scale of the problems at hand.
† The two remaining functions are given by Osborne (1972).