Page 277 - Artificial Intelligence in the Age of Neural Networks and Brain Computing

P. 277

270 CHAPTER 13 Multiview Learning in Biomedical Applications

The traditional approach to disease subtyping required the intervention of a clini-
cian, whose role was to single out anomalies in patterns or groups of outlier patients
on the basis of previous clinical experience. This task was usually accomplished as
an a posteriori analysis, and once a subgroup was selected a second retrospective
or prospective study was necessary to conﬁrm the hypothesis of the existence of
a new class of patients. Nowadays, thanks to the availability of high-throughput
biotechnologies it is possible to measure individual differences at the cellular and
molecular level. Moreover, the application of unsupervised automated techniques
for the analysis of high-throughput molecular data allows for unbiased biomedical
discoveries. Statistical methods and machine learning approachesd such as nonneg-
ative matrix factorization, hierarchical clustering, and probabilistic latent factor
analysis [8,9]d have been applied to identify subgroups of individuals showing com-
mon patterns of gene expression levels. Other omics data can be used in combination
with gene expression to build more accurate models for patient stratiﬁcation. For
example, somatic copy number alterations have proved to be promising biomarkers
for cancer subtype classiﬁcation [10]. Other alternatives to be considered are
microRNA expression and methylation data. Due to the variety of available data,
data integration approaches to the problem of subtyping patients have recently drawn
the attention of the research community.
Nevertheless, the integration of heterogeneous omics data poses several computa-
tional challenges, since generally a small number of samples is available for a rela-
tively high number of variables and different preprocessing strategies need to be
applied for each type of data source. In addition, data are usually redundant; so proper
techniques are needed to extract only relevant information. Finally, care must be taken
in deﬁning a coherent metric for studying relations between samples described
through heterogeneous modalities. Several of the recently proposed data integration
methods for patients’ subgroups discovery are based on supervised classiﬁcation,
unsupervised clustering, or biclustering [11e14]. A few approaches based on multi-
view clustering have been proposed for being used on omics data. Two examples
of these methods are similarity network fusion (SNF) [15] and a multiview genomic
data integration methodology (MVDA) [16].
SNF [15] is an intermediate integration network fusion methodology able to
integrate multiple genomic data (e.g., mRNA expression, DNA methylation, and
microRNA expression data) to identify relevant patient subtypes. The method ﬁrst
constructs a patient’s similarity network for each view. Then, it iteratively updates
the network with the information coming from other networks to make them more
similar at each step. At the end, this iterative process is converged to a ﬁnal fused
network (see Fig. 13.5).
The MVDA methodology [16] aims to combine dimensionality reduction, variable
selection, clustering (for each available data type), and data integration methods to
ﬁnd patient subtypes, as described in Fig. 13.6. First, the feature number for each
data type (genes, miRNAs, protein, etc.) is reduced by means of a cluster-based
correlation analysis. Second, a rank-based method is employed to select the features
that best separate patients into subtypes. Third, clustering is used to identify patient

272 273 274 275 276 277 278 279 280 281 282