Page 275 - Artificial Intelligence in the Age of Neural Networks and Brain Computing

P. 275

268 CHAPTER 13 Multiview Learning in Biomedical Applications

FIGURE 13.3
Data integration stages as proposed by Pavlidis et al. They proposed an Support Vector
Machine (SVM) kernel function in order to integrate microarray data. In early integration
methodologies SVMs are trained with a kernel obtained from the concatenation of all the
views in the dataset (A) In intermediate integration, ﬁrst a kernel is obtained for each view,
and then the combined kernel is used to train the SVM (B) In the late integration
methodology a single SVM is trained on a single kernel for each view and then the ﬁnal
results are combined (C).

integration approach, a distinct analysis workﬂow is carried out separately for each
view and only the results are integrated. The advantages of this methodology are
that (1) the investigator can choose an ad hoc algorithm for each view in order to
obtain the best possible result for each kind of data and (2) analyses on several views
can be executed in parallel.

2.2 TYPE OF DATA
Considering the nature of data that are the object of study, we can distinguish
between integration of homogeneous or heterogeneous data. In systems biology,
data are said to be homogeneous if they assay the same molecular level, as for
gene or protein expression or copy number variation. Conversely, heterogeneous
data are derived from two or more different molecular levels. In the latter case,
some challenges need to be tackled by the investigator. First of all, data may differ
in format and structure, varying from discrete or continuous numerical values to
more complex data such as sequences or graphs. Moreover, different data sources
may be characterized by different noise levels depending on the platform and on the
technologies used to generate the data. For this reason, during the integration phase,
a step of batch effect removal needs to be included, in order to bring to comparable
levels the noise and the random or systematic errors between the different views [2].

270 271 272 273 274 275 276 277 278 279 280