Page 305 - Applied Probability
P. 305

13. Sequence Analysis
                                                                                            293
                              alignment varies with the number of sequences, suppose that we have m
                                                                       i
                                                                i
                              sequences and the ith sequence is (x ,...,x ). Dynamic programming
                                                                1
                                                                       n i
                                                                             j
                                                              m
                                                       1
                              requires a score function s(y ,...,y ), where each y is a letter or a gap
                              −, with at most m − 1 gaps. The score function is sufficient to determine
                              a solution given the assumption of proportional gap scores. The dynamic
                              programming algorithm operates recursively by updating the maximum
                              score S(i 1 ,...,i m) assigned to the best alignment involving the first i j
                              elements of the jth sequence. To express the update as concisely as possible,
                              define the increment ∆ i to 0 or 1 and the operator ∆ i x to be
                                                            ,
                                                              − if ∆ i =0
                                                   ∆ i x  =               .
                                                              x  if ∆ i =1
                              In this notation, dynamic programming gives
                                      S(i 1 ,... ,i m )
                                                                               1
                                                                                         m
                                  =      max    {S(i 1 − ∆ 1 ,... ,i m − ∆ m )+ s(∆ 1 x ,..., ∆ m x )},
                                      ∆ 1 +···+∆ m>0                           i 1       i m
                              beginning with S(0,... , 0) = 0 and omitting the value ∆ j = 1 whenever
                                                                               m
                              the index i j = 0. Because the maximum extends over 2 − 1 possible vec-
                              tors (∆ 1 ,... , ∆ m ), the overall complexity is O(2 m    m  n i ). This rapidly
                                                                             i=1
                              mounting complexity obviously limits the usefulness of dynamic program-
                              ming and has prompted a diversity of heuristic treatments of the multiple
                              alignment problem [2, 10, 12].
                              Problems
                                 1. Suppose we wished to test the hypothesis that recognition sites for
                                   a certain restriction enzyme are Poisson distributed along a chromo-
                                   some with a predicted intensity λ 0 . One experimental approach would
                                   be to sample the fragments resulting from a digest of the chromosome
                                   by the restriction enzyme. The sampling is done by a probe, which
                                   is a small segment of the overall chromosome. The probe is made
                                   radioactive or fluorescent and will combine with and highlight those
                                   fragments sharing a common stretch of DNA with it. The lengths
                                   of the fragments can be measured, but it is impossible to tell which
                                   fragments are internal to the probe and which are partially external.
                                   To test the hypothesis that the true intensity λ = λ 0 , one can add
                                   up all fragment lengths and subtract the length of the probe. The
                                   resulting statistic X should be the sum of two independent exponen-
                                   tially distributed random variables with intensity λ. To complete the
                                   design of the experiment, we use n widely spaced probes and measure
                                   n independent excess distances X 1 ,...,X n .
                                    (a) Find the maximum likelihood estimate of λ.
   300   301   302   303   304   305   306   307   308   309   310