Page 247 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 247

238    CHAPTER 11 Deep Learning Approaches to Electrophysiological




                         This opens relevant possibilities in the immediate care of patients, for example, in
                         life-threatening situations in epileptic absences. However, it also raises novel chal-
                         lenges to DL, like the resource-constrained use of low-power devices. The EEG is a
                         complex signal, that is, a multivariate nonstationary time-series, which is inherently
                         high-dimensionaltakingintoaccount timeandspectralandchannel dynamic evolution.
                         Recently, various DL architectures have been proposed to decode disease- or task-
                         related information fromthe raw EEG recording withand without handcrafted features.
                         Higher-level features extracted from DL can be analyzed, visualized, and interpreted to
                         yield a different perspective with respect to conventional engineered features. Despite
                         the exponential growth of research papers in DL, in most cases a black-box approach
                         is yet provided. In what follows, some of the critical issues of presently investigated
                         DL are briefly summarized.


                         6.1 DL INTERPRETABILITY
                         Despite the countless successes, general methods to interpret how DL networks take
                         decisions are lacking. There is no theoretical understanding of how learning evolves
                         in DL networks and how it generates their inner organization. This unsolved lack of
                         ability to explain decisions to clinicians prevents the practical use of any predictive
                         outcome. Some information theoreticebased model has been proposed to “open the
                         black box”: in particular, it has been suggested that the network optimize the Infor-
                         mation Bottleneck tradeoff between prediction and compression in the successive
                         layers [40]. Essentially, it has been shown that DL spend most of the information
                         available in the database of training for learning efficient representations instead
                         of fitting the labels. This consideration seems to confirm the importance of UL
                         techniques, for which unsatisfactory algorithms have been devised so far [41].
                         Future advances in UL will focus in finding structural information on the input
                         signals and in building generative models: generative adversarial networks are
                         indeed highly promising directions of research [42].


                         6.2 ADVANCED LEARNING APPROACHES IN DL
                         One of the problems with DL is the overfitting of the training data, as the number of
                         free parameters is often quite high, compared to the size of the training set; in this
                         case, DL performs poorly in generalization, that is, on held-out test and validation
                         examples. This effect is particularly serious in a clinical setting, where the fresh
                         data often refer to a novel patient. Several strategies for reducing the impact of
                         overfitting have been proposed in the literature. One of this suggests to randomly
                         omitting half of the feature detectors on each training example [43]. The use of
                         AEs with UL is also quite beneficial, particularly when the learning cost functions
                         involve regularization terms, as in the dropout method. SAEs face the problem of
                         vanishing gradients that become negligible during training; therefore, the DL
                         network tends to learn the average of all the training examples. Furthermore, the
                         AE treats all inputs equally and its representational capability aims to minimize
   242   243   244   245   246   247   248   249   250   251   252