Page 22 - Biosystems Engineering
P. 22
Micr oarray Data Analysis Using Machine Learning Methods 3
tools that can mine and discover biologically meaningful knowledge
in such large and complex multivariate data are needed.
Many times, “raw” microarray data are not the best data for dis-
covering biological knowledge. This is due to the substantial amount
of noise and the technical variability caused by various stages such as
labeling, hybridization, and scanning. Low-level analysis methods
are applied to the raw microarray data to reduce background noise
and normalize and transform the data into a form acceptable to a
selected analysis method. Once the data are properly preprocessed,
high-level analysis methods are applied to elucidate biologically sig-
nificant information such as identifying differentially expressed
genes, clustering genes to identify a new transduction pathway or
novel genes that may be coregulated through the same known path-
way, discovering/predicting of unknown phenotypic class, selecting
genes that may have a functional role in specific phenotypes, and
deciphering gene regulatory networks.
Many computational methods have been proposed to perform
low- and high-level analysis of microarray data. Machine learning
methods have received impetus in recent years for use in high-level
microarray data analysis. This is due to their unique performance in
capturing nonlinear relationships and analyzing large-volume and
high-dimensional data.
The chapter is organized as follows: Sec. 1.2 presents an overview of
machine learning methods. Section 1.3 describes the two commonly
used microarray technologies: cDNA and high-density oligonucleotide
microarrays. Section 1.4 highlights low-level analysis methods for back-
ground adjustment and normalization of gene expression data gener-
ated by the two types of microarray technologies. Section 1.5 reviews
high-level analysis methods for clustering, classification, and genetic
network modeling. Section 1.6 summarizes and concludes the chapter.
1.2 Machine Learning Methods
Machine learning is the field of scientific study that concentrates on
induction algorithms and on other algorithms that can be said to
“learn.” Induction algorithms take as input specific instances and
produce a model that generalizes beyond these instances. In this sec-
tion, we present an overview of machine learning methods such as
artificial neural networks (NNs), fuzzy systems (FS), genetic algo-
rithms (GAs), particle swarm optimization (PSO), and support vector
machines (SVMs). Most of these methods have their conceptual ori-
gins in biological systems. For example, NNs model biological neural
systems. FS originated from studies of how organisms interact with
their environment. They can be used to transform human expert
knowledge into a mathematical description. GAs model natural evo-
lution. PSO is an adaptive algorithm based on the social metaphor of