Page 301 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 301

294    CHAPTER 15 Evolving Deep Neural Networks




                            As DNNs have been scaled up and improved, they have become much more
                         complex. A new challenge has therefore emerged: How to configure such systems?
                         Human engineers can optimize a handful of configuration parameters through exper-
                         imentation, but DNNs have complex topologies and hundreds of hyperparameters.
                         Moreover, such design choices matter; often success depends on finding the right ar-
                         chitecture for the problem. Much of the recent work in deep learning has indeed
                         focused on proposing different hand-designed architectures on new problems [5e8].
                            The complexity challenge is not unique to neural networks. Software and many
                         other engineered systems have become too complex for humans to optimize fully. As
                         a result, a new way of thinking about such design has started to emerge. In this
                         approach, humans are responsible for the high-level design, and the details are
                         left for computational optimization systems to figure out. For instance, humans write
                         the overall design of a software system, and the parameters and low-level code is
                         optimized automatically [9]; humans write imperfect versions of programs, and
                         evolutionary algorithms are then used to repair them [10]; humans define the space
                         of possible web designs, and evolution is used to find effective ones [11].
                            This same approach can be applied to the design of DNN architectures. This
                         problem includes three challenges: how to design the components of the architec-
                         ture, how to put them together into a full network topology, and how to set the hyper-
                         parameters for the components and the global design. These three aspects need to be
                         optimized separately for each new task.
                            This chapter develops an approach for automatic design of DNNs. It is based on
                         the existing neuroevolution technique of NeuroEvolution of Augmenting Topologies
                         (NEAT) [12], which has been successful in evolving topologies and weights of rela-
                         tively small recurrent networks in the past. In this paper, NEAT is extended to the
                         coevolutionary optimization of components, topologies, and hyperparameters. The
                         fitness of the evolved networks is determined based on how well they can be trained,
                         through gradient descent, to perform the task. The approach is demonstrated in the
                         standard benchmark tasks of object recognition and language modeling, and in a
                         real-world application of captioning images on a magazine website.
                            The results show that the approach discovers designs that are comparable to the
                         state of the art, and does it automatically without much development effort. The
                         approach is computationally extremely demandingdwith more computational po-
                         wer; it is likely to be more effective and possibly would surpass human design.
                         Such power is now becoming available in various forms of cloud computing and
                         grid computing, thereby making evolutionary optimization of neural networks a
                         promising approach for the future.



                         2. BACKGROUND AND RELATED WORK
                         Neuroevolution techniques have been applied successfully to sequential decision
                         tasks for three decades [13e16]. In such tasks there is no gradient available; so
                         instead of gradient descent, evolution is used to optimize the weights of the
   296   297   298   299   300   301   302   303   304   305   306