Page 32 - Biosystems Engineering
P. 32
Micr oarray Data Analysis Using Machine Learning Methods 13
The resulting model is tested on the validation sample. The process is
repeated until all samples appear in the validation set. In the holdout
method, only a single subset (also known as a validation set) is used to
estimate the generalization error. Thus, the holdout method does not
involve crossing. In bootstrapping, a subsample is randomly selected
from the full training dataset with replacement at each iteration.
1.3 Microarray Technology
Construction of microarrays is generally dependent on information
gained from genome sequencing or high-throughput expressed
sequence tag (EST) sequencing projects that provide large sets of
annotated clones and sequences. Based on methods of application of
DNA substrates on to the slides, there are two major types of microar-
rays: predesigned chips and spotted microarrays. Predesigned chips
synthesize 15 to 25 nucleotide oligomers directly on the chip using
the photolithographic technique. In contrast, a spotting robot or an
inkjet device is employed to place genomic DNA, cDNA, or 50- to 70-mer
oligonucleotides. Of the two devices, the spotted microarray is
cheaper to produce and offers more flexibility in terms of experimental
design and data analysis. The sensitivities of spotted microarrays
decrease from cDNA to long oligos and to short oligos; whereas the
specificity of detection efficiency increases from cDNA to short oli-
gos. Thus, short oligo microarrays have the potential to detect splic-
ing variants and members of multigene families. In contrast, because
of higher sensitivity, cDNA microarrays are more suitable for gene
expression studies in related species.
1.3.1 cDNA Microarray
The construction of cDNA microarrays begins with the production of
cDNA segments that represent each gene. Each segment is the com-
plement to the actual DNA sequence of a gene and differs from the
corresponding mRNA sequence only in that thymine in cDNA
replaces uracil in mRNA. Each spot on the microarray is created by
inserting copies of a gene’s cDNA sequence on a glass slide or other
substrate by a high-speed robotic process that physically binds the
sequence to a small spot on the slide. A spot is created for each gene
sequence to be used in the microarray. The substrate and the spots of
DNA sequences are collectively known as the microarray. Each spot
is referred to as a probe, while the hybridizing agent (cDNA or cRNA)
is the target.
To measure gene expression for a cell population, mRNA is
extracted from the cells and is reverse transcribed into cDNA. This
cDNA sequence is identical to the DNA sequence for the gene found in
the nucleus and is thus complementary to the cDNA probes on the
microarray chip. The concentration of each sequence is multiplied