Page 250 - Applied Probability
P. 250

237
                                                               11. Radiation Hybrid Mapping
                              Problems 2 and 3 elaborate on this point.
                                Generalization of the likelihood expressions in (11.5) to more loci involves
                              two subtleties. First, the sheer number of terms accounting for all possible
                              breakage and retention patterns quickly becomes unwieldy. Second, missing
                              data can no longer be ignored. The key to efficient likelihood computation is
                              to recognize that the likelihood splits into simple factors based on a hidden
                              Markov property of the underlying model. To expose this factorization
                              property, again assume that the loci 1,... ,m occur in numerical order
                              along the chromosome. Let θ i be the breakage probability on the interval
                              connecting loci i and i+1, and suppose only loci 1 ≤ t 1 <t 2 < ··· <t n ≤ m
                                                                        , then
                              are typed. If the typing result at locus t k is x t k
                                 Pr(X = x)  = Pr(X t 1  = x t 1 )                         (11.6)
                                                   n

                                                ×    Pr(X t i  = x t i  ) | X t 1  = x t 1  ,...,X t i−1  = x t i−1 ).
                                                  i=2
                                              ) is immediately available from (11.4). In the degenerate
                              Now Pr(X t 1  = x t 1
                              case n = 1, the product in (11.6) is taken as 1. In general, the independence
                              property of the governing Poisson process implies
                                                                                       ).
                                      Pr(X t i  = x t i  | X t 1  ,...,X t i−1  )=Pr(X t i  = x t i  | X t i−1
                                                     ,
                              Indeed, when X t i  = X t i−1
                                                                    t i −1


                                                         )=     1 −      (1 − θ j ) r x t i (1 − r) 1−x t i
                                Pr(X t i  = x t i  | X t 1  ,...,X t i−1
                                                                    j=t i−1
                                                                   t i −1

                                                                +      (1 − θ j ).        (11.7)
                                                                  j=t i−1
                              The first term on the right of (11.7) involves conditioning on at least one
                              break between loci t i−1 and t i . Here the retention fate of locus t i is no
                              longer tied to that of locus t i−1 . The second term involves conditioning
                                                                           , we have the simpler
                              on the complementary event. When X t i   = X t i−1
                              expression
                                                                    t i −1


                                                         )=     1 −      (1 − θ j ) r x t i (1 − r) 1−x t i
                                Pr(X t i  = x t i  | X t 1  ,...,X t i−1
                                                                    j=t i−1
                              since a break must occur somewhere between the two loci.
                                The EM algorithm provides an attractive avenue to maximum likelihood
                              estimation of the m parameters θ 1 ,...,θ m−1 and r. Collect these m pa-
                                                                     t
                              rameters into a vector γ =(θ 1 ,...,θ m−1 ,r) . Each of the entries γ i of γ
                              can be viewed as a success probability for a hidden binomial trial. As doc-
                              umented in Problem 9 of Chapter 2, the EM update for any one of these
   245   246   247   248   249   250   251   252   253   254   255