Page 248 - Artificial Intelligence in the Age of Neural Networks and Brain Computing

P. 248

7. Conclusions 239

the reconstruction error of each input. This is a shortcoming in the presence of noisy
data. In addition, being trained without knowing the labels, the AE cannot discrim-
inate between task-irrelevant and task-relevant information in the dataset. One of the
approaches to deal with these problems is to make a pretraining of DL networks and
to evaluate the architectural design (i.e., the number of layers and of the related
nodes) by means of information theoretical quantities, like entropy and mutual
information [44]. In other works, the impact of the hidden nodes is measured simply
by the variance of the output of the nodes. Indeed, any node with constant activation
across different inputs fails to convey discriminative information about the input
vectors. This observation can help reduce the size of the hidden layers during
or post training. Another way proposed to minimize the limitations of deep
architectures is to change the standard, biologically plausible, sigmoidal neuron
nonlinearity, by substituting it with a strong nonlinear rectiﬁer that helps to create
a sparse representations with true zeros. The EEG is not sparse per se but inter-
channel and intrachannel redundancy can be exploited to generate a block-sparse
representation in suitable domains [45]. An evident advantage of rectifying neurons
is that they allow DL network to better disentangle information with respect to dense
DL. In addition, by varying the number of active hidden neurons and thus giving a
variable-size ﬂexible data structure, they allow representing the effective dimension-
ality of the input data vectors.

6.3 ROBUSTNESS OF DL NETWORKS
DL is rapidly becoming a standard approach for the processing of medical and health
data. In particular, DL provides the opportunity to automate the extraction of relevant
features (as a difference with highly subjective interpretation of diagnostic data), to
integrate multimodal data, and combine the extraction stage with classiﬁcation pro-
cedures. The classiﬁcation performances are often limited as the available databases
are typically small and incomplete, and preprocessing of data remains commonly a
rather crucial step. The designed classiﬁers sometimes do not satisfy a check of
robustness: it happens that trained models reduce their performance if small perturba-
tions are applied to the examples. Adversarial perturbations are a relevant example. A
rectiﬁer-based sparse representation is typically robust to small input changes, as the
set of nonzero features is well conserved. From geometric considerations, it has been
shown that high instability of DL networks is related to data points that reside very
close to the classiﬁer’s decision boundary. As the robustness of DL is a critical require-
ment in a clinical setting, novel strategies for designing and training of DL schemes
should be devised from the community in the years to come.

7. CONCLUSIONS

DL can yield appropriate tools for analyzing multivariate time-series data from a
genuinely new perspective, which is both fully data-driven and automatic but can

243 244 245 246 247 248 249 250 251 252 253