Page 306 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 306
3. Evolution of Deep Learning Architectures 299
one of 10 classes. For comparison, the neural network layer types were restricted to
thoseusedbySnoek et al. [34] in their Bayesian optimization of CNN hyperpara-
meters. Also following Snoek et al., data augmentation consisted of converting the
images from RGB to HSV color space, adding random perturbations, distortions,
and crops, and converting them back to RGB color space.
CoDeepNEAT was initialized with populations of 25 blueprints and 45 modules.
From these two populations, 100 CNNs were assembled for fitness evaluation in
every generation. Each node in the module chromosome represents a convolutional
layer. Its hyperparameters determine the various properties of the layer and whether
additional max-pooling or dropout layers are attached (Table 15.1). In addition, a set
of global hyperparameters were evolved for the assembled network. During fitness
evaluation, the 50,000 images were split into a training set of 42,500 samples and a
validation set of 7500 samples. Since training a DNN is computationally very expen-
sive, each network was trained for eight epochs on the training set. The validation set
was then used to determine classification accuracy, that is, the fitness of the network.
After 72 generations of evolution, the best network in the population was returned.
After evolution was complete, the best network was trained on all 50,000 training
images for 300 epochs, and the classification error measured. This error was 7.3%,
comparable to the 6.4% error reported by Snoek et al. [34]. Interestingly, because
only limited training could be done during evolution, the best network evolved by
Table 15.1 Node and Global Hyperparameters Evolved in the
CIFAR-10 Domain
Node Hyperparameters Range
Number of Filters [32, 256]
Dropout Rate [0, 0.7]
Initial weight scaling [0, 2.0]
Kernel size {1, 3}
Max-pooling {True, False}
Global Hyperparameters Range
Learning rate [0.0001, 0.1]
Momentum [0.68, 0.99]
Hue shift [0, 45]
Saturation/value shift [0, 0.5]
Saturation/value scale [0, 0.5]
Cropped image size [26, 32]
Spatial scaling [0, 0.3]
Random horizontal flips {True, False}
Variance normalization {True, False}
Nesterov accelerated gradient {True, False}