Page 231 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 231
222 CHAPTER 11 Deep Learning Approaches to Electrophysiological
The goal of SL is to derive optimal weights’ matrices that minimize or achieve a
small error. However, NNs are asked to generalize well in unseen input vectors,
that is, generating small errors and also testing cases. In UL, there is no teacher,
and the NN is asked to autonomously extract some underlying statistical regu-
larities from the available data. In SSL, a prestage of UL is used to facilitate
the following SL procedure. In the case of availability of both labeled and
unlabeled data, these procedures can help to extract additional information on
the problem under analysis. In RL, the machine interacts with the surrounding
environment and, following the observation of the consequences of its actions,
it learns how to modify its own behavior in response to some form of “reward”
received.
The different approximation theorems imply a “sufficient” number of hidden
nodes to satisfy the universal approximation properties. However, this in turn
implies a high number of degrees of freedom in the optimization procedure.
The capacity of the NN is statistically bounded and underlies a relation between
the number of weights to be determined and the number of “examples” available
to train them. A high number of free parameters increases the descriptive
complexity of NN, approximately related to the number of bits of information
required to describe an NN. The complexity of NN limits generalization ability.
Unsuccessful generalization performance reduced the impact of the NN approach,
after an initial enthusiastic acceptance. Some techniques have been proposed to
reduce the complexity of NNs, among which some concepts relevant to DL are
the weight sharing, the regularization, and the forced introduction of data
invariances.
3. DEEP ARCHITECTURES AND LEARNING
DL methods iteratively modify more sets of parameters (the weights and the biases
of the layer-to-layer matrices) by minimizing a loss/cost function aiming to define an
optimal set. However, the performance of DL, and more generally, ML and NN
approaches, strongly depends on the quality of available data or the careful selection
of a representation suitable for the task at hands. Most of the efforts in designing the
processing chain are thus devoted to data preprocessing or domain transformation.
Time-series are commonly analyzed in time, frequency, or time-frequency domain.
The related transformation constitutes an engineering way to extract features from
data in some specific domain. DL represents the most significant and successful
advance in ML over the last decade. A large part of the current appeal of DL tech-
niques derives from the possibility of acquiring data representations that are not
model-based but totally data-driven. This circumvents the need to hand-designed
features. The hierarchically organized learned features are often richer and more
powerful than the ones suitably engineered. DL is indeed an emerging methodology
firmly rooted within the traditional ML community, whose main objective is
to design learning algorithms and architectures for extracting multiple level