Page 305 - Applied Probability
P. 305
13. Sequence Analysis
293
alignment varies with the number of sequences, suppose that we have m
i
i
sequences and the ith sequence is (x ,...,x ). Dynamic programming
1
n i
j
m
1
requires a score function s(y ,...,y ), where each y is a letter or a gap
−, with at most m − 1 gaps. The score function is sufficient to determine
a solution given the assumption of proportional gap scores. The dynamic
programming algorithm operates recursively by updating the maximum
score S(i 1 ,...,i m) assigned to the best alignment involving the first i j
elements of the jth sequence. To express the update as concisely as possible,
define the increment ∆ i to 0 or 1 and the operator ∆ i x to be
,
− if ∆ i =0
∆ i x = .
x if ∆ i =1
In this notation, dynamic programming gives
S(i 1 ,... ,i m )
1
m
= max {S(i 1 − ∆ 1 ,... ,i m − ∆ m )+ s(∆ 1 x ,..., ∆ m x )},
∆ 1 +···+∆ m>0 i 1 i m
beginning with S(0,... , 0) = 0 and omitting the value ∆ j = 1 whenever
m
the index i j = 0. Because the maximum extends over 2 − 1 possible vec-
tors (∆ 1 ,... , ∆ m ), the overall complexity is O(2 m m n i ). This rapidly
i=1
mounting complexity obviously limits the usefulness of dynamic program-
ming and has prompted a diversity of heuristic treatments of the multiple
alignment problem [2, 10, 12].
Problems
1. Suppose we wished to test the hypothesis that recognition sites for
a certain restriction enzyme are Poisson distributed along a chromo-
some with a predicted intensity λ 0 . One experimental approach would
be to sample the fragments resulting from a digest of the chromosome
by the restriction enzyme. The sampling is done by a probe, which
is a small segment of the overall chromosome. The probe is made
radioactive or fluorescent and will combine with and highlight those
fragments sharing a common stretch of DNA with it. The lengths
of the fragments can be measured, but it is impossible to tell which
fragments are internal to the probe and which are partially external.
To test the hypothesis that the true intensity λ = λ 0 , one can add
up all fragment lengths and subtract the length of the probe. The
resulting statistic X should be the sum of two independent exponen-
tially distributed random variables with intensity λ. To complete the
design of the experiment, we use n widely spaced probes and measure
n independent excess distances X 1 ,...,X n .
(a) Find the maximum likelihood estimate of λ.