Page 32 - Biosystems Engineering
P. 32

Micr oarray Data Analysis Using Machine Learning Methods       13

               The resulting model is tested on the validation sample. The process is
               repeated until all samples appear in the validation set. In the holdout
               method, only a single subset (also known as a validation set) is used to
               estimate the generalization error. Thus, the holdout method does not
               involve crossing. In bootstrapping, a subsample is randomly selected
               from the full training dataset with replacement at each iteration.



          1.3 Microarray Technology
               Construction of microarrays is generally dependent on information
               gained from genome sequencing or high-throughput expressed
               sequence tag (EST) sequencing projects that provide large sets of
               annotated clones and sequences. Based on methods of application of
               DNA substrates on to the slides, there are two major types of microar-
               rays: predesigned chips and spotted microarrays. Predesigned chips
               synthesize 15 to 25 nucleotide oligomers directly on the chip using
               the photolithographic technique. In contrast, a spotting robot or an
               inkjet device is employed to place genomic DNA, cDNA, or 50- to 70-mer
               oligonucleotides. Of the two devices, the spotted microarray is
               cheaper to produce and offers more flexibility in terms of experimental
               design and data analysis. The sensitivities of spotted microarrays
               decrease from cDNA to long oligos and to short oligos; whereas the
               specificity of detection efficiency increases from cDNA to short oli-
               gos. Thus, short oligo microarrays have the potential to detect splic-
               ing variants and members of multigene families. In contrast, because
               of higher sensitivity, cDNA microarrays are more suitable for gene
               expression studies in related species.


               1.3.1 cDNA Microarray
               The construction of cDNA microarrays begins with the production of
               cDNA segments that represent each gene. Each segment is the com-
               plement to the actual DNA sequence of a gene and differs from the
               corresponding mRNA sequence only in that thymine in cDNA
               replaces uracil in mRNA. Each spot on the microarray is created by
               inserting copies of a gene’s cDNA sequence on a glass slide or other
               substrate by a high-speed robotic process that physically binds the
               sequence to a small spot on the slide. A spot is created for each gene
               sequence to be used in the microarray. The substrate and the spots of
               DNA sequences are collectively known as the microarray. Each spot
               is referred to as a probe, while the hybridizing agent (cDNA or cRNA)
               is the target.
                   To measure gene expression for a cell population, mRNA is
               extracted from the cells and is reverse transcribed into cDNA. This
               cDNA sequence is identical to the DNA sequence for the gene found in
               the nucleus and is thus complementary to the cDNA probes on the
               microarray chip. The concentration of each sequence is multiplied
   27   28   29   30   31   32   33   34   35   36   37