Page 302 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 302

2. Background and Related Work     295




                  neural network. Neuroevolution is a good approach in particular to POMDP
                  (partially observable Markov decision process) problems because of recurrency: It
                  is possible to evolve recurrent connections to allow disambiguating hidden states.
                     The weights can be optimized using various evolutionary techniques. Genetic al-
                  gorithms are a natural choice because crossover is a good match with neural net-
                  works: they recombine parts of existing neural networks to find better ones.
                  CMA-ES [17], a technique for continuous optimization, works well on optimizing
                  the weights, as well because it can capture interactions between them. Other ap-
                  proaches such as SANE, ESP, and CoSyNE evolve partial neural networks and
                  combine them into fully functional networks [18e20]. Further, techniques such as
                  Cellular Encoding [21] and NEAT [12] have been developed to evolve the topology
                  of the neural network, which is particularly effective in determining the required
                  recurrence. Neuroevolution techniques have been shown to work well in many tasks
                  in control, robotics, constructing intelligent agents for games, and artificial life [14].
                  However, because of the large number of weights to be optimized, they are generally
                  limited to relatively small networks.
                     Evolution has been combined with gradient descentebased learning in several
                  ways, making it possible to utilize much larger networks. These methods are still
                  usually applied to sequential decision tasks, but gradients from a related task
                  (such as prediction of the next sensory inputs) are used to help search. Much of
                  the work is based on utilizing the Baldwin effect, where learning only affects the se-
                  lection [22]. Computationally, it is possible to utilize Lamarckian evolution as well,
                  that is, encode the learned weight changes back into the genome [21]. However, care
                  must be taken to maintain diversity so that evolution can continue to innovate when
                  all individuals are learning similar behavior.
                     Evolution of DNNs departs from this prior work in that it is applied to supervised
                  domains where gradients are available, and evolution is used only to optimize the
                  design of the neural network. Deep neuroevolution is thus more closely related to
                  bilevel (or multilevel) optimization techniques [23]. The idea is to use an evolu-
                  tionary optimization process at a high level to optimize the parameters of a low-
                  level evolutionary optimization process.
                     Consider for instance the problem of controlling a helicopter through aileron,
                  elevator, rudder, and rotor inputs. This is a challenging benchmark from the
                  2000s for which various reinforcement learning approaches have been developed
                  [24e26]. One of the most successful ones is single-level neuroevolution, where
                  the helicopter is controlled by a neural network that is evolved through genetic al-
                  gorithms [27]. The eight parameters of the neuroevolution method (such as mutation
                  and crossover rate, probability, and amount and population and elite size) are
                  optimized by hand. It would be difficult to include more parameters because the
                  parameters interact nonlinearly. A large part of the parameter space thus remains un-
                  explored in the single-level neuroevolution approach. However, a bilevel approach,
                  where a high-level evolutionary process is employed to optimize these parameters,
                  can search this space more effectively [28]. With bilevel evolution, the number of pa-
                  rameters optimized could be extended to 15, which would result in a significantly
   297   298   299   300   301   302   303   304   305   306   307