Page 21 - Biosystems Engineering
P. 21
2 Cha pte r O n e
and detection of gene splicing variants and members of multigene
families. In Arabidopsis spp., the microarray has been used to monitor
stochastic and epigenetic changes in gene expression of natural and
chemically induced polyploids. The technique also allowed genome-
wide fingerprinting of closely related species with no prior DNA
sequence information (Jaccoud et al. 2001; Lezar et al. 2004).
In the field of animal husbandry, the prospect of simultaneous,
genomewide, high-throughput analysis of gene expression opens
novel strategies to improve breeding (Ushizawa et al. 2004), to study
nutrition and tissue physiology (Band et al. 2002; Rodriguez-Zas 2003),
to identify disease-resistant genes (Diez-Tascon et al. 2005), and to
locate treatment targets for diseases (Huang et al. 2003). In addition,
microarray techniques have been used to evaluate gene expression in
different tissues, developmental stages, genomic duplication events,
viral infection, drug treatments, tumors, and aging in animals.
Microarray technology creates massive expression patterns of
thousands of genes. In addition to the high-dimensionality and
complexity of gene expression data, there are many unknown and
undiscovered functional relations in the physical delivery system
used for collecting the data itself. Also, current microarray technol-
ogies provide data that are associated with a substantial amount of
noise. These nonlinearities and noises adversely affect the extrac-
tion of relevant information from the data. To address this challenge,
the first step is to use experimental designs that will ensure quality
and reliable data with appropriate statistical power for detecting
differences between treatment and control groups with respect to
gene expression levels. It is essential to carry out replicate studies
for key aspects of experiments so that measures of variability are
available for testing hypotheses. The next step toward addressing
this challenge is to understand the nature and properties of the data
structure generated by current microarray technologies. Under-
standing the data structure helps in selecting/designing appropri-
ate tools for analysis and compensation for technical variability. The
discussion in this chapter focuses mainly on analysis of microarray
data, assuming that appropriate experimental design was carried
out prior to data generation.
Microarray data can be organized as matrices where the rows
represent genes (or clones) and columns represent various sample
phenotypes or experimental conditions. Each entry in the matrix cor-
responds to the expression level of a gene for a given condition or
sample. A set of entries in a row or a column forms an expression pat-
tern. A gene expression data matrix may consist of 10,000s of rows
(genes) and 10s to 100s of columns (samples). Although microarray
technology provides researchers with such large amounts of gene
expression data, analysis of the massive data has been one of the
major bottlenecks in using the technology effectively. Computational