Page 44 - Biosystems Engineering
P. 44
Micr oarray Data Analysis Using Machine Learning Methods 25
expression data using the Synechocystis microarray with the sampling
interval (20 min) to study its physiological response to altering light
conditions. Compared to earlier time-series experiments with DNA
microarray, this smaller time interval enabled the time-lagged correla-
tions to identify directional transcriptional relationships on a 20-min
time scale. Khanin and Wit (2007) constructed the gene network of
Plasmodium falciparum based on the method combining two types of
correlations between each pair of genes: standard Pearson and partial
correlations. The topology of the malaria gene expression network they
obtained is consistent with scale-free behavior, similar to other biologi-
cal networks.
Friedman et al. (2000) discussed how Bayesian networks describe
interactions between genes and demonstrated their method on the
S. cerevisiae cell-cycle measurements of Spellman et al. (1998). Based
on the work of Friedman et al. (2000), Pe’er et al. (2001) extended this
work by integrating a new discretization procedure and a principled
way for learning with a mixture of observational and interventional
data. To test Bayesian networks on gene expression data, several
studies assessed the inference results on real (Zak et al. 2001) or simu-
lated (Smith et al. 2002; Husmeier 2003) gene expression data, which
allow estimating the proportion of spurious gene interactions incurred
for a specified target proportion of recovered true interactions.
Through these studies, it is demonstrated how network inference
performance varies with the training set size, the degree of inade-
quacy of prior assumptions, the experimental sampling strategy, and
the inclusion of further, sequence-based information.
Approaches that integrate gene expression data with other types
of genomic information (e.g., promoter sequence) have also been pro-
posed. Given known transcription factors (TFs), some researchers
(Roulet et al. 1998; Krivan and Wasserman 2001; Grabe 2002; Halfon
et al. 2002) have tried to find their binding motifs in the regions
upstream of genes. Others have tried to predict gene targets of TFs
using genomewide sequence searches of promoter regions for a TF
whose target motif is known (Schuldiner et al. 1998; Zhu et al. 2002).
A more powerful approach to determine targets of TFs whose binding
motifs are unknown is the combined use of genomewide location and
gene expression analyses. Genomewide location analysis (protein–
DNA binding; ChIp–chip) combines techniques of chromatin immu-
noprecipitaion and microarray hybridization. In yeast, researchers
have used this method to identify the targets of many TFs. On the
other hand, gene expression data analysis uses a computational
approach to identify the targets of TFs from time-course gene expres-
sion profiles. Bar-Joseph et al. (2003) and Hartemink et al. (2002)
developed an algorithm that combines information from genome-
wide location and expression profiles to decipher regulatory net-
works. Qian et al. (2003) constructed a gene network using the data