Page 307 - Applied Probability

P. 307

13. Sequence Analysis
295
m
p R s
F(s)
=
m
p R s +(1 − s)Q(s)
Q(1)
=
F (1)
,
p R
n
where Q(s)= m−1 q n s . Note that F (1) is the expected time be-

n=0
tween renewals. Explicitly calculate F (1) for the pattern GCGC.

(Hints: To demonstrate equation (13.11), condition on the most re-
cent renewal of the pattern. In proving equation (13.12), multiply
n
equation (13.11) by s and sum on n. You will need to use the fact
u j = 0 for 1 ≤ j ≤ m − 1.)
5. Let x =(x 1 ,...,x m ) and y =(y 1 ,... ,y n ) be two strings such that
m ≥ n. Show that:
m

(a) The number of alignments of length m is .
n
n
m+2
(b) The number of alignments of length m +2 is .
2 n
n
m+k
(c) In general, the number of alignments of length m+k is .
k n
n
m+k
(d) The total number of alignments is .
k k n

Note that we distinguish the double pairs x i − and −x i .
−y j y j −

6. If we refuse to distinguish the double pairs x i − and −x i in an
−y j y j −
alignment between two strings x =(x 1 ,... ,x m ) and y =(y 1 ,...,y n),
m+n
then show that the number of alignments reduces to . (Hints:
m

Argue that an alignment is completely determined by what pairs x i
y j
n
m
n
m m+n
occur within it. Hence, there are = =
k k k k m−k k m
possible alignments.)
7. A gap penalty g(k) ≥ 0 is said to be subadditive if g(k) ≤ kg(1)
for all k ≥ 0 and concave if g(k +1) − g(k) ≥ g(k +2) − g(k +1)
for all k ≥ 0. Show that a concave gap penalty g(k)with g(0) = 0
is subadditive and that the particular choice g(k)= α ln(k +1) is
concave for α ≥ 0. (Hint: Any concave function on the nonnegative
integers can be extended to a concave function on [0, ∞) by linear
interpolation. Subadditivity is then geometrically obvious.)
8. Continuing Problem 7, let f(l) be a function on the nonnegative in-
tegers, and deﬁne a function g k (l)= f(k + l) − f(k) for every integer
k ≥ 0. Show that f(l) is concave if and only if each g k (l) is subaddi-
tive.
9. Consider a word of length n chosen from the four-letter DNA alphabet
{A,T,C,G}. If letters are chosen for each position independently with
probabilities p A, p T , p C , and p G , respectively, then let R A be the
n

302 303 304 305 306 307 308 309 310 311 312