Page 277 - Artificial Intelligence in the Age of Neural Networks and Brain Computing
P. 277

270    CHAPTER 13 Multiview Learning in Biomedical Applications




                            The traditional approach to disease subtyping required the intervention of a clini-
                         cian, whose role was to single out anomalies in patterns or groups of outlier patients
                         on the basis of previous clinical experience. This task was usually accomplished as
                         an a posteriori analysis, and once a subgroup was selected a second retrospective
                         or prospective study was necessary to confirm the hypothesis of the existence of
                         a new class of patients. Nowadays, thanks to the availability of high-throughput
                         biotechnologies it is possible to measure individual differences at the cellular and
                         molecular level. Moreover, the application of unsupervised automated techniques
                         for the analysis of high-throughput molecular data allows for unbiased biomedical
                         discoveries. Statistical methods and machine learning approachesd such as nonneg-
                         ative matrix factorization, hierarchical clustering, and probabilistic latent factor
                         analysis [8,9]d have been applied to identify subgroups of individuals showing com-
                         mon patterns of gene expression levels. Other omics data can be used in combination
                         with gene expression to build more accurate models for patient stratification. For
                         example, somatic copy number alterations have proved to be promising biomarkers
                         for cancer subtype classification [10]. Other alternatives to be considered are
                         microRNA expression and methylation data. Due to the variety of available data,
                         data integration approaches to the problem of subtyping patients have recently drawn
                         the attention of the research community.
                            Nevertheless, the integration of heterogeneous omics data poses several computa-
                         tional challenges, since generally a small number of samples is available for a rela-
                         tively high number of variables and different preprocessing strategies need to be
                         applied for each type of data source. In addition, data are usually redundant; so proper
                         techniques are needed to extract only relevant information. Finally, care must be taken
                         in defining a coherent metric for studying relations between samples described
                         through heterogeneous modalities. Several of the recently proposed data integration
                         methods for patients’ subgroups discovery are based on supervised classification,
                         unsupervised clustering, or biclustering [11e14]. A few approaches based on multi-
                         view clustering have been proposed for being used on omics data. Two examples
                         of these methods are similarity network fusion (SNF) [15] and a multiview genomic
                         data integration methodology (MVDA) [16].
                            SNF [15] is an intermediate integration network fusion methodology able to
                         integrate multiple genomic data (e.g., mRNA expression, DNA methylation, and
                         microRNA expression data) to identify relevant patient subtypes. The method first
                         constructs a patient’s similarity network for each view. Then, it iteratively updates
                         the network with the information coming from other networks to make them more
                         similar at each step. At the end, this iterative process is converged to a final fused
                         network (see Fig. 13.5).
                            The MVDA methodology [16] aims to combine dimensionality reduction, variable
                         selection, clustering (for each available data type), and data integration methods to
                         find patient subtypes, as described in Fig. 13.6. First, the feature number for each
                         data type (genes, miRNAs, protein, etc.) is reduced by means of a cluster-based
                         correlation analysis. Second, a rank-based method is employed to select the features
                         that best separate patients into subtypes. Third, clustering is used to identify patient
   272   273   274   275   276   277   278   279   280   281   282