Page 275 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 275

268    CHAPTER 13 Multiview Learning in Biomedical Applications



























                         FIGURE 13.3
                         Data integration stages as proposed by Pavlidis et al. They proposed an Support Vector
                         Machine (SVM) kernel function in order to integrate microarray data. In early integration
                         methodologies SVMs are trained with a kernel obtained from the concatenation of all the
                         views in the dataset (A) In intermediate integration, first a kernel is obtained for each view,
                         and then the combined kernel is used to train the SVM (B) In the late integration
                         methodology a single SVM is trained on a single kernel for each view and then the final
                         results are combined (C).

                         integration approach, a distinct analysis workflow is carried out separately for each
                         view and only the results are integrated. The advantages of this methodology are
                         that (1) the investigator can choose an ad hoc algorithm for each view in order to
                         obtain the best possible result for each kind of data and (2) analyses on several views
                         can be executed in parallel.

                         2.2 TYPE OF DATA
                         Considering the nature of data that are the object of study, we can distinguish
                         between integration of homogeneous or heterogeneous data. In systems biology,
                         data are said to be homogeneous if they assay the same molecular level, as for
                         gene or protein expression or copy number variation. Conversely, heterogeneous
                         data are derived from two or more different molecular levels. In the latter case,
                         some challenges need to be tackled by the investigator. First of all, data may differ
                         in format and structure, varying from discrete or continuous numerical values to
                         more complex data such as sequences or graphs. Moreover, different data sources
                         may be characterized by different noise levels depending on the platform and on the
                         technologies used to generate the data. For this reason, during the integration phase,
                         a step of batch effect removal needs to be included, in order to bring to comparable
                         levels the noise and the random or systematic errors between the different views [2].
   270   271   272   273   274   275   276   277   278   279   280