Page 306 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 306

3. Evolution of Deep Learning Architectures  299




                  one of 10 classes. For comparison, the neural network layer types were restricted to
                  thoseusedbySnoek et al. [34] in their Bayesian optimization of CNN hyperpara-
                  meters. Also following Snoek et al., data augmentation consisted of converting the
                  images from RGB to HSV color space, adding random perturbations, distortions,
                  and crops, and converting them back to RGB color space.
                     CoDeepNEAT was initialized with populations of 25 blueprints and 45 modules.
                  From these two populations, 100 CNNs were assembled for fitness evaluation in
                  every generation. Each node in the module chromosome represents a convolutional
                  layer. Its hyperparameters determine the various properties of the layer and whether
                  additional max-pooling or dropout layers are attached (Table 15.1). In addition, a set
                  of global hyperparameters were evolved for the assembled network. During fitness
                  evaluation, the 50,000 images were split into a training set of 42,500 samples and a
                  validation set of 7500 samples. Since training a DNN is computationally very expen-
                  sive, each network was trained for eight epochs on the training set. The validation set
                  was then used to determine classification accuracy, that is, the fitness of the network.
                  After 72 generations of evolution, the best network in the population was returned.
                     After evolution was complete, the best network was trained on all 50,000 training
                  images for 300 epochs, and the classification error measured. This error was 7.3%,
                  comparable to the 6.4% error reported by Snoek et al. [34]. Interestingly, because
                  only limited training could be done during evolution, the best network evolved by



                  Table 15.1 Node and Global Hyperparameters Evolved in the
                  CIFAR-10 Domain
                   Node Hyperparameters                 Range
                   Number of Filters                    [32, 256]
                   Dropout Rate                         [0, 0.7]
                   Initial weight scaling               [0, 2.0]
                   Kernel size                          {1, 3}
                   Max-pooling                          {True, False}
                   Global Hyperparameters               Range
                   Learning rate                        [0.0001, 0.1]
                   Momentum                             [0.68, 0.99]
                   Hue shift                            [0, 45]
                   Saturation/value shift               [0, 0.5]
                   Saturation/value scale               [0, 0.5]
                   Cropped image size                   [26, 32]
                   Spatial scaling                      [0, 0.3]
                   Random horizontal flips               {True, False}
                   Variance normalization               {True, False}
                   Nesterov accelerated gradient        {True, False}
   301   302   303   304   305   306   307   308   309   310   311