Page 51 - Applied Probability
P. 51
2. Counting Methods and the EM Algorithm
34
individuals of this genotype, phase cannot be discerned. Now show
that the EM update for p AB is
2n AABB + n AABb + n AaBB + n mAB/ab
=
p m+1,AB
2n
2p mAB p mab
n mAB/ab = n AaBb .
2p mAB p mab +2p mAbp maB
There are similar updates for p Ab , p aB , and p ab , but these can be
dispensed with if one notes that for all m
p A = p mAB + p mAb
p B = p mAB + p maB
1= p mAB + p mAb + p maB + p mab ,
where p A and p B are the gene-counting estimates of the frequencies
of alleles A and B. Implement this EM algorithm on the mosquito
data [17] given in Table 2.5. You should find that ˆ p AB = .73.
TABLE 2.5. Mosquito Data at the Idh1 and Mdh Loci
n AABB =19 n AABb =5 n AAbb =0
n AaBB =8 n AaBb =8 n Aabb =0
n aaBB =0 n aaBb =0 n aabb =0
6. In a genetic linkage experiment, AB/ab animals are crossed to mea-
sure the recombination fraction θ between two loci with alleles A and
a at the first locus and alleles B and b at the second locus. In this
design the dominant alleles A and B are in the coupling phase. Ver-
ify that the offspring of an AB/ab × AB/ab mating fall into the four
categories AB, Ab, aB, and ab with probabilities π 1 = 1 + (1−θ) 2 ,
2 4
1−(1−θ) 2 1−(1−θ) 2 (1−θ) 2
π 2 = , π 3 = , and π 4 = , respectively. Devise
4 4 4
an EM algorithm to estimate θ, and apply it to the counts
(y 1 ,y 2 ,y 3 ,y 4 ) = (125, 18, 20, 34)
observed on 197 offspring of such matings. You should find the max-
ˆ
imum likelihood estimate θ = .2083 [11]. (Hints: Split the first cate-
gory into two so that there are five categories for the complete data.
2
Reparameterize by setting φ =(1 − θ) .)
7. In an inbred population, the inbreeding coefficient f is the probability
that two genes of a random person at some locus are both copies of
the same ancestral gene. Assume that there are k codominant alleles
and that p i is the frequency of allele A i . Show that fp i +(1 − f)p 2 i