Page 307 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 307

300    CHAPTER 15 Evolving Deep Neural Networks





















                         FIGURE 15.2
                         Top: Simplified visualization of the best network evolved by CoDeepNEAT for the CIFAR-
                         10 domain. Node 1 is the input layer, while Node 2 is the output layer. The network has
                         repetitive structure because its blueprint reuses same module in multiple places. Bottom:
                         A more detailed visualization of the same network.


                         CoDeepNEAT trains very fast. While the network of Snoek et al. takes over 30
                         epochs to reach 20% test error and over 200 epochs to converge, the best network
                         from evolution takes only 12 epochs to reach 20% test error and around 120 epochs
                         to converge. This network utilizes the same modules multiple times, resulting in a
                         deep and repetitive structure typical of many successful DNNs (Fig. 15.2).



                         4. EVOLUTION OF LSTM ARCHITECTURES

                         Recurrent neural networks, in particular, those utilizing LSTM nodes, are another
                         powerful approach to DNN. Much of the power comes from repetition of LSTM
                         modules and the connectivity between them. In this section, CoDeepNEAT is
                         extended with mutations that allow searching for such connectivity, and the
                         approach is evaluated in the standard benchmark task of language modeling.

                         4.1 EXTENDING CoDeepNEAT to LSTMs
                         LSTM consists of gated memory cells that can integrate information over longer
                         time scales (as compared to simply using recurrent connections in a neural network).
                         LSTMs have recently been shown to be powerful in supervised sequence processing
                         tasks such as speech recognition [35] and machine translation [36].
                            Recent research on LSTMs has focused in two directions: finding variations of
                         individual LSTM memory unit architecture [37e40] and discovering new ways
                         of stitching LSTM layers into a network [41e43]. Both approaches have improved
                         performance over vanilla LSTMs, with best recent results achieved through
   302   303   304   305   306   307   308   309   310   311   312