Page 44 - Biosystems Engineering
P. 44

Micr oarray Data Analysis Using Machine Learning Methods       25

               expression data using the Synechocystis microarray with the sampling
               interval (20 min) to study its physiological response to altering light
               conditions. Compared to earlier time-series experiments with DNA
               microarray, this smaller time interval enabled the time-lagged correla-
               tions to identify directional transcriptional relationships on a 20-min
               time scale. Khanin and Wit (2007) constructed the gene network of
               Plasmodium falciparum based on the method combining two types of
               correlations between each pair of genes: standard Pearson and partial
               correlations. The topology of the malaria gene expression network they
               obtained is consistent with scale-free behavior, similar to other biologi-
               cal networks.
                   Friedman et al. (2000) discussed how Bayesian networks describe
               interactions between genes and demonstrated their method on the
               S. cerevisiae cell-cycle measurements of Spellman et al. (1998). Based
               on the work of Friedman et al. (2000), Pe’er et al. (2001) extended this
               work by integrating a new discretization procedure and a principled
               way for learning with a mixture of observational and interventional
               data. To test Bayesian networks on gene expression data, several
               studies assessed the inference results on real (Zak et al. 2001) or simu-
               lated (Smith et al. 2002; Husmeier 2003) gene expression data, which
               allow estimating the proportion of spurious gene interactions incurred
               for a specified target proportion of recovered true interactions.
               Through these studies, it is demonstrated how network inference
               performance varies with the training set size, the degree of inade-
               quacy of prior assumptions, the experimental sampling strategy, and
               the inclusion of further, sequence-based information.
                   Approaches that integrate gene expression data with other types
               of genomic information (e.g., promoter sequence) have also been pro-
               posed. Given known transcription factors (TFs), some researchers
               (Roulet et al. 1998; Krivan and Wasserman 2001; Grabe 2002; Halfon
               et al. 2002) have tried to find their binding motifs in the regions
               upstream of genes. Others have tried to predict gene targets of TFs
               using genomewide sequence searches of promoter regions for a TF
               whose target motif is known (Schuldiner et al. 1998; Zhu et al. 2002).
               A more powerful approach to determine targets of TFs whose binding
               motifs are unknown is the combined use of genomewide location and
               gene expression analyses. Genomewide location analysis (protein–
               DNA binding; ChIp–chip) combines techniques of chromatin immu-
               noprecipitaion and microarray hybridization. In yeast, researchers
               have used this method to identify the targets of many TFs. On the
               other hand, gene expression data analysis uses a computational
               approach to identify the targets of TFs from time-course gene expres-
               sion profiles. Bar-Joseph et al. (2003) and Hartemink et al. (2002)
               developed an algorithm that combines information from genome-
               wide location and expression profiles to decipher regulatory net-
               works. Qian et al. (2003) constructed a gene network using the data
   39   40   41   42   43   44   45   46   47   48   49