Page 232 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 232

3. Deep Architectures and Learning    223




                  representations from data. The representation is both hierarchical and distributed, as
                  the relevant characteristics of a problem emerge gradually in successive levels (or
                  layers) as a collective result similar to shallow NNs. These representations facilitate
                  the pattern recognition tasks sometimes without the need of any feature engineering
                  but just autonomously extracting them from the available data. Indeed, the succes-
                  sive latent representations are able to disentangle potential confounding factors in
                  the input data, also reducing their complexity. Fig. 11.1B shows a deep architecture.
                  The huge amount of researches recently carried out in the field by a large number of
                  academic and industrial groups are motivated by the surprising successes achieved
                  also in precommercial competitions and applications. AlphaGo and Watson are
                  some relevant examples. Major IT companies (e.g., Google, IBM, Intel, Facebook,
                  Baidu, and Microsoft) hold a large extent of patents in the field; they also made DL
                  their core business. This resurgence of interest in the NN approach is related to the
                  following evidences:

                  1. General availability of large database (Big Data) coming from international
                     initiatives and worldwide collaboration on projects;
                  2. Availability of big computing power mainly associated with Cloud computing
                     and novel GPU extensions;
                  3. Availability of novel algorithms and processing architectures, or advanced par-
                     adigms of computation, like quantum computing and memristor-based network
                     implementations.
                     Indeed, as previously noted, the capacity of an NN chain is related to the number
                  of free parameters whose estimation calls for large datasets. In turn, to process Big
                  Data, powerful computing is needed.
                     Some of the DL schemes are biologically motivated, in essence, brain visual cor-
                  tex inspired hierarchical DL architectures. In particular, neurons found in the visual
                  cortex of cats respond to specific properties of visual sensory inputs, like lines,
                  edges, colors, and the successive layers extract combinations of such low-level
                  features to derive higher-level features resulting in objects’ recognition [3].
                     Several DL models have been proposed in the literature. In what follows, the
                  most known are presented: Deep Belief Networks (DBNs), Stacked Autoencoders
                  (SAEs), and Deep Convolution Neural Networks (CNN).


                  3.1 DEEP BELIEF NETWORKS
                  DBN is a probabilistic generative model, composed by stacked modules of
                  Restricted Boltzmann Machines (RBMs) (Fig. 11.2) [4]. An RBM is an undirected
                  energy based model with two layers of visible (v) and hidden (h) units, respectively,
                  with connections only between layers. Each RBM module is trained one at time in an
                  unsupervised manner and using contrastive divergence procedure [5]. The output
                  (learned features) of each stage is used as input of the subsequent RBM stage. Later,
                  the whole network is commonly trained with supervised learning to improve
                  classification performance (fine-tuning method).
   227   228   229   230   231   232   233   234   235   236   237