Page 274 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 274
2. Multiview Learning 267
FIGURE 13.2
Data integration taxonomy.
according to the statistical problem, the type of analysis to be performed, the type of
data to be integrated, and the stage in which integration is accomplished.
2.1 INTEGRATION STAGE
When building up a workflow for data analysis, an investigator can choose to perform
the integration step at different stages; we can then distinguish between early, inter-
mediate, and late integration (see Fig. 13.3). The choice of one method over another
depends on aspects that are problem-specific, such as the heterogeneity of the input
data and the statistical problem to be addressed. Early integration is performed before
any analysis step, directly manipulating the input data. This strategy consists in fact in
concatenating all the variables from the multiple views to obtain a single feature
space, but without changing the nature or general format of data. It is usually applied
to combine data coming from multiple experiments in a bigger pool. The main draw-
back of this methodology is though the choice of a suitable distance metric: the
concatenation of views translates in an increase of the dimensionality of the feature
space, which in turn can affect the performance of classical similarity measures [1].
While early integration builds a new feature space concatenating different views, in-
termediate integration transforms each data view in a common feature space, thus
avoiding the problem of increasing data dimensionality. For example, in classification
problems every view can be transformed in a similarity matrix (or kernel) and these
matrices can then be combined to obtain more accurate results. Finally, in the late