Page 248 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 248
7. Conclusions 239
the reconstruction error of each input. This is a shortcoming in the presence of noisy
data. In addition, being trained without knowing the labels, the AE cannot discrim-
inate between task-irrelevant and task-relevant information in the dataset. One of the
approaches to deal with these problems is to make a pretraining of DL networks and
to evaluate the architectural design (i.e., the number of layers and of the related
nodes) by means of information theoretical quantities, like entropy and mutual
information [44]. In other works, the impact of the hidden nodes is measured simply
by the variance of the output of the nodes. Indeed, any node with constant activation
across different inputs fails to convey discriminative information about the input
vectors. This observation can help reduce the size of the hidden layers during
or post training. Another way proposed to minimize the limitations of deep
architectures is to change the standard, biologically plausible, sigmoidal neuron
nonlinearity, by substituting it with a strong nonlinear rectifier that helps to create
a sparse representations with true zeros. The EEG is not sparse per se but inter-
channel and intrachannel redundancy can be exploited to generate a block-sparse
representation in suitable domains [45]. An evident advantage of rectifying neurons
is that they allow DL network to better disentangle information with respect to dense
DL. In addition, by varying the number of active hidden neurons and thus giving a
variable-size flexible data structure, they allow representing the effective dimension-
ality of the input data vectors.
6.3 ROBUSTNESS OF DL NETWORKS
DL is rapidly becoming a standard approach for the processing of medical and health
data. In particular, DL provides the opportunity to automate the extraction of relevant
features (as a difference with highly subjective interpretation of diagnostic data), to
integrate multimodal data, and combine the extraction stage with classification pro-
cedures. The classification performances are often limited as the available databases
are typically small and incomplete, and preprocessing of data remains commonly a
rather crucial step. The designed classifiers sometimes do not satisfy a check of
robustness: it happens that trained models reduce their performance if small perturba-
tions are applied to the examples. Adversarial perturbations are a relevant example. A
rectifier-based sparse representation is typically robust to small input changes, as the
set of nonzero features is well conserved. From geometric considerations, it has been
shown that high instability of DL networks is related to data points that reside very
close to the classifier’s decision boundary. As the robustness of DL is a critical require-
ment in a clinical setting, novel strategies for designing and training of DL schemes
should be devised from the community in the years to come.
7. CONCLUSIONS
DL can yield appropriate tools for analyzing multivariate time-series data from a
genuinely new perspective, which is both fully data-driven and automatic but can