Page 232 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 232
3. Deep Architectures and Learning 223
representations from data. The representation is both hierarchical and distributed, as
the relevant characteristics of a problem emerge gradually in successive levels (or
layers) as a collective result similar to shallow NNs. These representations facilitate
the pattern recognition tasks sometimes without the need of any feature engineering
but just autonomously extracting them from the available data. Indeed, the succes-
sive latent representations are able to disentangle potential confounding factors in
the input data, also reducing their complexity. Fig. 11.1B shows a deep architecture.
The huge amount of researches recently carried out in the field by a large number of
academic and industrial groups are motivated by the surprising successes achieved
also in precommercial competitions and applications. AlphaGo and Watson are
some relevant examples. Major IT companies (e.g., Google, IBM, Intel, Facebook,
Baidu, and Microsoft) hold a large extent of patents in the field; they also made DL
their core business. This resurgence of interest in the NN approach is related to the
following evidences:
1. General availability of large database (Big Data) coming from international
initiatives and worldwide collaboration on projects;
2. Availability of big computing power mainly associated with Cloud computing
and novel GPU extensions;
3. Availability of novel algorithms and processing architectures, or advanced par-
adigms of computation, like quantum computing and memristor-based network
implementations.
Indeed, as previously noted, the capacity of an NN chain is related to the number
of free parameters whose estimation calls for large datasets. In turn, to process Big
Data, powerful computing is needed.
Some of the DL schemes are biologically motivated, in essence, brain visual cor-
tex inspired hierarchical DL architectures. In particular, neurons found in the visual
cortex of cats respond to specific properties of visual sensory inputs, like lines,
edges, colors, and the successive layers extract combinations of such low-level
features to derive higher-level features resulting in objects’ recognition [3].
Several DL models have been proposed in the literature. In what follows, the
most known are presented: Deep Belief Networks (DBNs), Stacked Autoencoders
(SAEs), and Deep Convolution Neural Networks (CNN).
3.1 DEEP BELIEF NETWORKS
DBN is a probabilistic generative model, composed by stacked modules of
Restricted Boltzmann Machines (RBMs) (Fig. 11.2) [4]. An RBM is an undirected
energy based model with two layers of visible (v) and hidden (h) units, respectively,
with connections only between layers. Each RBM module is trained one at time in an
unsupervised manner and using contrastive divergence procedure [5]. The output
(learned features) of each stage is used as input of the subsequent RBM stage. Later,
the whole network is commonly trained with supervised learning to improve
classification performance (fine-tuning method).