Page 248 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 248

7. Conclusions    239




                  the reconstruction error of each input. This is a shortcoming in the presence of noisy
                  data. In addition, being trained without knowing the labels, the AE cannot discrim-
                  inate between task-irrelevant and task-relevant information in the dataset. One of the
                  approaches to deal with these problems is to make a pretraining of DL networks and
                  to evaluate the architectural design (i.e., the number of layers and of the related
                  nodes) by means of information theoretical quantities, like entropy and mutual
                  information [44]. In other works, the impact of the hidden nodes is measured simply
                  by the variance of the output of the nodes. Indeed, any node with constant activation
                  across different inputs fails to convey discriminative information about the input
                  vectors. This observation can help reduce the size of the hidden layers during
                  or post training. Another way proposed to minimize the limitations of deep
                  architectures is to change the standard, biologically plausible, sigmoidal neuron
                  nonlinearity, by substituting it with a strong nonlinear rectifier that helps to create
                  a sparse representations with true zeros. The EEG is not sparse per se but inter-
                  channel and intrachannel redundancy can be exploited to generate a block-sparse
                  representation in suitable domains [45]. An evident advantage of rectifying neurons
                  is that they allow DL network to better disentangle information with respect to dense
                  DL. In addition, by varying the number of active hidden neurons and thus giving a
                  variable-size flexible data structure, they allow representing the effective dimension-
                  ality of the input data vectors.


                  6.3 ROBUSTNESS OF DL NETWORKS
                  DL is rapidly becoming a standard approach for the processing of medical and health
                  data. In particular, DL provides the opportunity to automate the extraction of relevant
                  features (as a difference with highly subjective interpretation of diagnostic data), to
                  integrate multimodal data, and combine the extraction stage with classification pro-
                  cedures. The classification performances are often limited as the available databases
                  are typically small and incomplete, and preprocessing of data remains commonly a
                  rather crucial step. The designed classifiers sometimes do not satisfy a check of
                  robustness: it happens that trained models reduce their performance if small perturba-
                  tions are applied to the examples. Adversarial perturbations are a relevant example. A
                  rectifier-based sparse representation is typically robust to small input changes, as the
                  set of nonzero features is well conserved. From geometric considerations, it has been
                  shown that high instability of DL networks is related to data points that reside very
                  close to the classifier’s decision boundary. As the robustness of DL is a critical require-
                  ment in a clinical setting, novel strategies for designing and training of DL schemes
                  should be devised from the community in the years to come.



                  7. CONCLUSIONS

                  DL can yield appropriate tools for analyzing multivariate time-series data from a
                  genuinely new perspective, which is both fully data-driven and automatic but can
   243   244   245   246   247   248   249   250   251   252   253