Page 304 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 304
3. Evolution of Deep Learning Architectures 297
DeepNEAT differs from NEAT in that each node in the chromosome no longer
represents a neuron, but a layer in DNN. Each node contains a table of real and bi-
nary valued hyperparameters that are mutated through uniform Gaussian distribution
and random bit-flipping, respectively. These hyperparameters determine the type of
layer (such as convolutional, fully connected, or recurrent) and the properties of that
layer (such as number of neurons, kernel size, and activation function). The edges in
the chromosome are no longer marked with weights; instead they simply indicate
how the nodes (layers) are connected together. To construct a DNN from a Deep-
NEAT chromosome, one simply needs to traverse the chromosome graph, replacing
each node with the corresponding layer. The chromosome also contains a set of
global hyperparameters applicable to the entire network (such as learning rate,
training algorithm, and data preprocessing).
When arbitrary connectivity is allowed between layers, additional complexity is
required. If the current layer has multiple parent layers, a merge layer must be
applied to the parents in order to ensure that the parent layer’s output is the same
size as the current layer’s input. Typically, this adjustment is done through a concat-
enation or elementwise sum operation. If the parent layers have mismatched output
sizes, all of the parent layers must be downsampled to parent layer with the smallest
output size. The specific method for downsampling is domain-dependent. For
example, in image classification, a max-pooling layer is inserted after specific parent
layers; in image captioning, a fully connected bottleneck layer will serve this
function.
During fitness evaluation, each chromosome is converted into a DNN. These
DNNs are then trained for a fixed number of epochs. After training, a metric that
indicates the network’s performance is returned back to DeepNEAT and assigned
as fitness to the corresponding chromosome in the population.
While DeepNEAT can be used to evolve DNNs, the resulting structures are often
complex and unprincipled. They contrast with typical DNN architectures that utilize
repetition of basic components. DeepNEAT is therefore extended to evolution of
modules and blueprints next.
3.2 COOPERATIVE COEVOLUTION OF MODULES AND BLUEPRINTS
Many of the most successful DNNs, such as GoogLeNet and ResNet, are composed
of modules that are repeated multiple times [5,7]. These modules often themselves
have complicated structure with branching and merging of various layers. Inspired
by this observation, a variant of DeepNEAT, called Coevolution DeepNEAT
(CoDeepNEAT), is proposed. The algorithm behind CoDeep-NEAT is inspired
mainly by Hierarchical SANE [20], but is also influenced by component-
evolution approaches, ESP [33] and CoSyNE [19].
In CoDeepNEAT, two populations of modules and blueprints are evolved sepa-
rately, using the same methods as described above for DeepNEAT. The blueprint
chromosome is a graph where each node contains a pointer to a particular module
species. In turn, each module chromosome is a graph that represents a small