Page 231 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 231

222    CHAPTER 11 Deep Learning Approaches to Electrophysiological




                         The goal of SL is to derive optimal weights’ matrices that minimize or achieve a
                         small error. However, NNs are asked to generalize well in unseen input vectors,
                         that is, generating small errors and also testing cases. In UL, there is no teacher,
                         and the NN is asked to autonomously extract some underlying statistical regu-
                         larities from the available data. In SSL, a prestage of UL is used to facilitate
                         the following SL procedure. In the case of availability of both labeled and
                         unlabeled data, these procedures can help to extract additional information on
                         the problem under analysis. In RL, the machine interacts with the surrounding
                         environment and, following the observation of the consequences of its actions,
                         it learns how to modify its own behavior in response to some form of “reward”
                         received.
                            The different approximation theorems imply a “sufficient” number of hidden
                         nodes to satisfy the universal approximation properties. However, this in turn
                         implies a high number of degrees of freedom in the optimization procedure.
                         The capacity of the NN is statistically bounded and underlies a relation between
                         the number of weights to be determined and the number of “examples” available
                         to train them. A high number of free parameters increases the descriptive
                         complexity of NN, approximately related to the number of bits of information
                         required to describe an NN. The complexity of NN limits generalization ability.
                         Unsuccessful generalization performance reduced the impact of the NN approach,
                         after an initial enthusiastic acceptance. Some techniques have been proposed to
                         reduce the complexity of NNs, among which some concepts relevant to DL are
                         the weight sharing, the regularization, and the forced introduction of data
                         invariances.




                         3. DEEP ARCHITECTURES AND LEARNING
                         DL methods iteratively modify more sets of parameters (the weights and the biases
                         of the layer-to-layer matrices) by minimizing a loss/cost function aiming to define an
                         optimal set. However, the performance of DL, and more generally, ML and NN
                         approaches, strongly depends on the quality of available data or the careful selection
                         of a representation suitable for the task at hands. Most of the efforts in designing the
                         processing chain are thus devoted to data preprocessing or domain transformation.
                         Time-series are commonly analyzed in time, frequency, or time-frequency domain.
                         The related transformation constitutes an engineering way to extract features from
                         data in some specific domain. DL represents the most significant and successful
                         advance in ML over the last decade. A large part of the current appeal of DL tech-
                         niques derives from the possibility of acquiring data representations that are not
                         model-based but totally data-driven. This circumvents the need to hand-designed
                         features. The hierarchically organized learned features are often richer and more
                         powerful than the ones suitably engineered. DL is indeed an emerging methodology
                         firmly rooted within the traditional ML community, whose main objective is
                         to design learning algorithms and architectures for extracting multiple level
   226   227   228   229   230   231   232   233   234   235   236