Page 248 - Applied Probability
P. 248

235
                                                               11. Radiation Hybrid Mapping
                                Thus, the crux of the proof reduces to showing that it is possible to
                              match one to one each of the intervals (k, k + 1) against a union set I σ(i)
                              that contains or covers it. This assertion is a special case of Hall’s marriage
                              theorem [6]. A simple direct proof avoiding appeal to Hall’s theorem can
                              be given by induction on m. The assertion is certainly true for m =2.
                              Suppose it is true for m − 1 ≥ 2 and any permutation. There are two cases
                              to consider.
                                In the first case, the last locus m is internal to the given permutation
                              σ in the sense that σ equals (σ(1),...,i,m,j, ... ,σ(m)). Omitting m from
                              σ gives a permutation ω of 1,... ,m − 1 for which the m − 2 intervals
                              (1, 2),... , (m − 2,m − 1) can be matched by induction. Assuming j< i,
                              the pair {i, j} in ω covers one of the intervals (j, j +1),... , (i − 1,i)in
                              this matching. In the permutation σ, match the pair {j, m} to this covered
                              interval. This is possible because j< i. To the pair {i, m} in σ, match the
                              interval (m − 1,m). The full matching for σ is constructed by appending
                              these two matches to the matches for ω minus the match for the pair {i, j}.
                              The situation with i< j is handled similarly.
                                In the second case, m is positioned at the end of σ. By our conven-
                              tion this means σ =(σ(1),... ,σ(m − 1),m). By induction, a matching
                              can be constructed between ω =(σ(1),...,σ(m − 1)) and the intervals
                              (1, 2),... , (m − 2,m − 1). To this matching append the permitted match
                              between the pair {σ(m − 1),m} and (m − 1,m). This completes the proof.
                                Clones with undetected typing errors can unduly influence the ranking
                              of locus orders. A clone bearing a large number of obligate breaks probably
                              should be retyped at the loci delimiting its obligate breaks. To identify
                              outlier clones, one needs to compute the distribution of the number of
                              obligate breaks under the true order and the true retention and breakage
                              probabilities. This distribution can be computed recursively by defining
                              p k (i, j) to be the joint probability that there are j obligate breaks scored
                              among the first k loci of a clone and that the kth locus is present in the
                              clone in i copies. The index k ranges from 1 to m, the index j ranges from 0
                              to k−1, and the index i equals 0 or 1. In this notation, the initial conditions
                                                               1 − r  for i =0
                                                 p 1 (i, 0) =
                                                              r     for i =1
                              are obvious. With no missing data and with θ k now indicating the breakage
                              probability between loci k and k + 1, the appropriate recurrence relations
                              for adding locus k + 1 are
                                       p k+1 (0,j)= p k (0,j)(1 − θ k r)+ p k (1,j − 1)θ k (1 − r)
                                       p k+1 (1,j)= p k (0,j − 1)θ k r + p k (1,j)[1 − θ k (1 − r)].

                              In these recurrence relations, p k (i, j) is taken as 0 whenever j< 0. When
                              the final locus k = m is reached, the probabilities p m(i, j) can be summed
                              on i to produce the distribution of the number of obligate breaks. In prac-
                              tice, the best order identified and estimates of the retention and breakage
   243   244   245   246   247   248   249   250   251   252   253