Page 73 - Applied Probability
P. 73
3. Newton’s Method and Scoring
56
distribution. Use formula (3.13) and show that
(n i + α i )
2
E(p | N 1 = n 1 ,...,N k = n k )=
i
(2n + α .)
(n i + α i ) 2 2 4
(n i + α i ) 2 2
2
Var(p | N 1 = n 1 ,...,N k = n k )= −
i
(2n + α .) 4 (2n + α . ) 2
(n i + α i )(n j + α j )
E(2p i p j | N 1 = n 1 ,...,N k = n k )= 2
(2n + α . ) 2
2
4(n i + α i ) (n j + α j ) 2
Var(2p i p j | N 1 = n 1 ,...,N k = n k )=
(2n + α . ) 4
2
2(n i + α i )(n j + α j )
− ,
(2n + α . ) 2
r
where x = x(x +1) ··· (x + r − 1) denotes a rising factorial power.
It is interesting that the above mean expressions entail
2
E(p | N 1 = n 1 ,...,N k = n k ) > p ˜ 2 i
i
E(2p i p j | N 1 = n 1 ,...,N k = n k ) < 2˜ p i ˜ p j ,
where ˜ p i and ˜ p j are the posterior means of p i and p j .
17. Problem 5 of Chapter 2 considers haplotype frequency estimation for
two linked, biallelic loci. The EM algorithm discussed there relies on
the allele-counting estimates p A , p a , p B , and p b .
(a) Construct the Dirichlet prior from these estimates mentioned
in Section 3.8 and devise an EM algorithm that maximizes the
product of the prior and the likelihood of the observed data. In
particular, show that the EM update for p AB is
2n AABB + n AABb + n AaBB + n mAB/ab + β AB
p m+1,AB =
2n + β
2p mAB p mab
n mAB/ab = n AaBb ,
2p mAB p mab +2p mAb p maB
where β AB = α AB − 1 and β = α − 4. There are similar updates
for p Ab , p aB , and p ab . (Hint: The log prior passes untouched
through the conditional expectation of the E step of the EM
algorithm.)
(b) Implement this EM algorithm on the mosquito data given in
Table 2.5 of Chapter 2 for the value α − 4 = 10 and starting
from the estimated linkage equilibrium frequencies. You should
p
find that ˆ AB = .717, ˆp Ab = .083, ˆp aB = .121, and ˆp ab = .079.
(c) Describe how you would generalize the algorithm to more than
two loci and more than two alleles per locus.