Page 248 - Applied Probability

P. 248

235
11. Radiation Hybrid Mapping
Thus, the crux of the proof reduces to showing that it is possible to
match one to one each of the intervals (k, k + 1) against a union set I σ(i)
that contains or covers it. This assertion is a special case of Hall’s marriage
theorem [6]. A simple direct proof avoiding appeal to Hall’s theorem can
be given by induction on m. The assertion is certainly true for m =2.
Suppose it is true for m − 1 ≥ 2 and any permutation. There are two cases
to consider.
In the ﬁrst case, the last locus m is internal to the given permutation
σ in the sense that σ equals (σ(1),...,i,m,j, ... ,σ(m)). Omitting m from
σ gives a permutation ω of 1,... ,m − 1 for which the m − 2 intervals
(1, 2),... , (m − 2,m − 1) can be matched by induction. Assuming j< i,
the pair {i, j} in ω covers one of the intervals (j, j +1),... , (i − 1,i)in
this matching. In the permutation σ, match the pair {j, m} to this covered
interval. This is possible because j< i. To the pair {i, m} in σ, match the
interval (m − 1,m). The full matching for σ is constructed by appending
these two matches to the matches for ω minus the match for the pair {i, j}.
The situation with i< j is handled similarly.
In the second case, m is positioned at the end of σ. By our conven-
tion this means σ =(σ(1),... ,σ(m − 1),m). By induction, a matching
can be constructed between ω =(σ(1),...,σ(m − 1)) and the intervals
(1, 2),... , (m − 2,m − 1). To this matching append the permitted match
between the pair {σ(m − 1),m} and (m − 1,m). This completes the proof.
Clones with undetected typing errors can unduly inﬂuence the ranking
of locus orders. A clone bearing a large number of obligate breaks probably
should be retyped at the loci delimiting its obligate breaks. To identify
outlier clones, one needs to compute the distribution of the number of
obligate breaks under the true order and the true retention and breakage
probabilities. This distribution can be computed recursively by deﬁning
p k (i, j) to be the joint probability that there are j obligate breaks scored
among the ﬁrst k loci of a clone and that the kth locus is present in the
clone in i copies. The index k ranges from 1 to m, the index j ranges from 0
to k−1, and the index i equals 0 or 1. In this notation, the initial conditions
1 − r for i =0
p 1 (i, 0) =
r for i =1
are obvious. With no missing data and with θ k now indicating the breakage
probability between loci k and k + 1, the appropriate recurrence relations
for adding locus k + 1 are
p k+1 (0,j)= p k (0,j)(1 − θ k r)+ p k (1,j − 1)θ k (1 − r)
p k+1 (1,j)= p k (0,j − 1)θ k r + p k (1,j)[1 − θ k (1 − r)].

In these recurrence relations, p k (i, j) is taken as 0 whenever j< 0. When
the ﬁnal locus k = m is reached, the probabilities p m(i, j) can be summed
on i to produce the distribution of the number of obligate breaks. In prac-
tice, the best order identiﬁed and estimates of the retention and breakage

243 244 245 246 247 248 249 250 251 252 253