Page 40 - Biosystems Engineering
P. 40

Micr oarray Data Analysis Using Machine Learning Methods       21

                   Most classification algorithms perform suboptimally with thou-
               sands of genes and require the selection of the most relevant genes
               that are most predictive of a phenotype. Performing appropriate
               gene selection helps in achieving accurate classification. There are
               two objectives in gene selection: improving the prediction perfor-
               mance of the models and providing a better understanding of the
               underlying concepts that generated the data. Gene selection may
               start by filtering genes with no or significantly low fold change. A
               small subset of genes can be selected from the remaining genes using
               various techniques described in Sec. 1.4. Clustering methods can
               also be used to identify groups of coregulated genes; cluster centers
               of these groups can then be used as inputs to a classifier. Supervised
               methods identify the most informative genes using approaches such
               as (1) analysis of differential expression via a two-sample t-test, anal-
               ysis of variance, etc., (2) selecting a gene’s signal-to-noise ratio of
               above a prespecified cutoff, and (3) choosing genes that are corre-
               lated with an expected outcome (e.g., class labels). Optimizations
               methods can also be used in which a subset of genes is selected
               recursively (sequential or via “evolutionary” trial and error) and the
               best possible combination of genes is selected based on its classifica-
               tion performance.
                   Molecular classification based on machine learning algorithms
               have been shown to have statistical and clinical relevance for a variety
               of tumor types: leukemia (Golub et al. 1999), lymphoma (Shipp et al.
               2002), brain cancer (Pomeroy et al. 2002), lung cancer (Bhattacharjee
               et al. 2001), and the classification of multiple primary tumors (Ramas-
               wamy et al. 2001). The performance of machine learning methods in
               classifying microarray data can be enhanced if the most informative
               genes are used. For example, Guyon et al. (2002) applied a gene selec-
               tion method that used SVM based on recursive feature elimination.
               They demonstrated experimentally that the selected genes yielded
               improved classification performance.


               1.5.3 Genetic Network Modeling
               With the help of global expression data—especially using time series
               microarray data—one can attempt to reverse engineer a network of
               gene interaction. The benefits of characterizing gene interaction are
               many; for example, the effects of drugs on a regulatory pathway can
               be characterized; tumor development in cells can be tracked, etc. Sev-
               eral methods have been proposed to develop maps of gene interac-
               tion, including linear equations (D’haeseleer et al. 1999; Weaver et al.
               1999), differential equations (Chen et al. 1999), Boolean networks
               (Liang et al. 1998; Shmulevich et al. 2002), fuzzy logic–based methods
               (Woolf and Wang 2000; Ressom et al. 2003a), correlation-based
               approaches (Herrero et al. 2003; Schmitt et al. 2004), and Bayesian
               networks (Friedman et al. 2000).
   35   36   37   38   39   40   41   42   43   44   45