Page 21 - Biosystems Engineering
P. 21

2    Cha pte r  O n e

               and detection of gene splicing variants and members of multigene
               families. In Arabidopsis spp., the microarray has been used to monitor
               stochastic and epigenetic changes in gene expression of natural and
               chemically induced polyploids. The technique also allowed genome-
               wide fingerprinting of closely related species with no prior DNA
               sequence information (Jaccoud et al. 2001; Lezar et al. 2004).
                   In the field of animal husbandry, the prospect of simultaneous,
               genomewide, high-throughput analysis of gene expression opens
               novel strategies to improve breeding (Ushizawa et al. 2004), to study
               nutrition and tissue physiology (Band et al. 2002; Rodriguez-Zas 2003),
               to identify disease-resistant genes (Diez-Tascon et al. 2005), and to
               locate treatment targets for diseases (Huang et al. 2003). In addition,
               microarray techniques have been used to evaluate gene expression in
               different tissues, developmental stages, genomic duplication events,
               viral infection, drug treatments, tumors, and aging in animals.
                   Microarray technology creates massive expression patterns of
               thousands of genes. In addition to the high-dimensionality and
               complexity of gene expression data, there are many unknown and
               undiscovered functional relations in the physical delivery system
               used for collecting the data itself. Also, current microarray technol-
               ogies provide data that are associated with a substantial amount of
               noise. These nonlinearities and noises adversely affect the extrac-
               tion of relevant information from the data. To address this challenge,
               the first step is to use experimental designs that will ensure quality
               and reliable data with appropriate statistical power for detecting
               differences between treatment and control groups with respect to
               gene expression levels. It is essential to carry out replicate studies
               for key aspects of experiments so that measures of variability are
               available for testing hypotheses. The next step toward addressing
               this challenge is to understand the nature and properties of the data
               structure generated by current microarray technologies. Under-
               standing the data structure helps in selecting/designing appropri-
               ate tools for analysis and compensation for technical variability. The
               discussion in this chapter focuses mainly on analysis of microarray
               data, assuming that appropriate experimental design was carried
               out prior to data generation.
                   Microarray data can be organized as matrices where the rows
               represent genes (or clones) and columns represent various sample
               phenotypes or experimental conditions. Each entry in the matrix cor-
               responds to the expression level of a gene for a given condition or
               sample. A set of entries in a row or a column forms an expression pat-
               tern. A gene expression data matrix may consist of 10,000s of rows
               (genes) and 10s to 100s of columns (samples). Although microarray
               technology provides researchers with such large amounts of gene
               expression data, analysis of the massive data has been one of the
               major bottlenecks in using the technology effectively. Computational
   16   17   18   19   20   21   22   23   24   25   26