Page 250 - Applied Probability
P. 250
237
11. Radiation Hybrid Mapping
Problems 2 and 3 elaborate on this point.
Generalization of the likelihood expressions in (11.5) to more loci involves
two subtleties. First, the sheer number of terms accounting for all possible
breakage and retention patterns quickly becomes unwieldy. Second, missing
data can no longer be ignored. The key to efficient likelihood computation is
to recognize that the likelihood splits into simple factors based on a hidden
Markov property of the underlying model. To expose this factorization
property, again assume that the loci 1,... ,m occur in numerical order
along the chromosome. Let θ i be the breakage probability on the interval
connecting loci i and i+1, and suppose only loci 1 ≤ t 1 <t 2 < ··· <t n ≤ m
, then
are typed. If the typing result at locus t k is x t k
Pr(X = x) = Pr(X t 1 = x t 1 ) (11.6)
n
× Pr(X t i = x t i ) | X t 1 = x t 1 ,...,X t i−1 = x t i−1 ).
i=2
) is immediately available from (11.4). In the degenerate
Now Pr(X t 1 = x t 1
case n = 1, the product in (11.6) is taken as 1. In general, the independence
property of the governing Poisson process implies
).
Pr(X t i = x t i | X t 1 ,...,X t i−1 )=Pr(X t i = x t i | X t i−1
,
Indeed, when X t i = X t i−1
t i −1
)= 1 − (1 − θ j ) r x t i (1 − r) 1−x t i
Pr(X t i = x t i | X t 1 ,...,X t i−1
j=t i−1
t i −1
+ (1 − θ j ). (11.7)
j=t i−1
The first term on the right of (11.7) involves conditioning on at least one
break between loci t i−1 and t i . Here the retention fate of locus t i is no
longer tied to that of locus t i−1 . The second term involves conditioning
, we have the simpler
on the complementary event. When X t i = X t i−1
expression
t i −1
)= 1 − (1 − θ j ) r x t i (1 − r) 1−x t i
Pr(X t i = x t i | X t 1 ,...,X t i−1
j=t i−1
since a break must occur somewhere between the two loci.
The EM algorithm provides an attractive avenue to maximum likelihood
estimation of the m parameters θ 1 ,...,θ m−1 and r. Collect these m pa-
t
rameters into a vector γ =(θ 1 ,...,θ m−1 ,r) . Each of the entries γ i of γ
can be viewed as a success probability for a hidden binomial trial. As doc-
umented in Problem 9 of Chapter 2, the EM update for any one of these