Page 22 - Biosystems Engineering
P. 22

Micr oarray Data Analysis Using Machine Learning Methods       3

               tools that can mine and discover biologically meaningful knowledge
               in such large and complex multivariate data are needed.
                   Many times, “raw” microarray data are not the best data for dis-
               covering biological knowledge. This is due to the substantial amount
               of noise and the technical variability caused by various stages such as
               labeling, hybridization, and scanning. Low-level analysis methods
               are applied to the raw microarray data to reduce background noise
               and normalize and transform the data into a form acceptable to a
               selected analysis method. Once the data are properly preprocessed,
               high-level analysis methods are applied to elucidate biologically sig-
               nificant information such as identifying differentially expressed
               genes, clustering genes to identify a new transduction pathway or
               novel genes that may be coregulated through the same known path-
               way, discovering/predicting of unknown phenotypic class, selecting
               genes that may have a functional role in specific phenotypes, and
               deciphering gene regulatory networks.
                   Many computational methods have been proposed to perform
               low- and high-level analysis of microarray data. Machine learning
               methods have received impetus in recent years for use in high-level
               microarray data analysis. This is due to their unique performance in
               capturing nonlinear relationships and analyzing large-volume and
               high-dimensional data.
                   The chapter is organized as follows: Sec. 1.2 presents an overview of
               machine learning methods. Section 1.3 describes the two commonly
               used microarray technologies: cDNA and high-density oligonucleotide
               microarrays. Section 1.4 highlights low-level analysis methods for back-
               ground adjustment and normalization of gene expression data gener-
               ated by the two types of microarray technologies. Section 1.5 reviews
               high-level analysis methods for clustering, classification, and genetic
               network modeling. Section 1.6 summarizes and concludes the chapter.


          1.2 Machine Learning Methods
               Machine learning is the field of scientific study that concentrates on
               induction algorithms and on other algorithms that can be said to
               “learn.” Induction algorithms take as input specific instances and
               produce a model that generalizes beyond these instances. In this sec-
               tion, we present an overview of machine learning methods such as
               artificial neural networks (NNs), fuzzy systems (FS), genetic algo-
               rithms (GAs), particle swarm optimization (PSO), and support vector
               machines (SVMs). Most of these methods have their conceptual ori-
               gins in biological systems. For example, NNs model biological neural
               systems. FS originated from studies of how organisms interact with
               their environment. They can be used to transform human expert
               knowledge into a mathematical description. GAs model natural evo-
               lution. PSO is an adaptive algorithm based on the social metaphor of
   17   18   19   20   21   22   23   24   25   26   27