Page 43 - Applied Probability

P. 43

Let us return to the ABO example and formalize gene counting as an EM
algorithm. The observed numbers of people in each of the four phenotypic
categories constitute the observed data Y , while the unknown numbers of
people in each of the six genotypic categories constitute the complete data
X. Let n A/A be the number of people of genotype A/A. Deﬁne n A/O , n B/B ,
and n B/O similarly and set n = n A + n B + n AB + n O . Note that the n AB
people of phenotype AB and the n O people of phenotype O are already
correctly assigned to their respective genotypes A/B and O/O. With this
notation the complete data loglikelihood becomes
2
ln f(X | p)= n A/A ln p + n A/O ln(2p A p O )+ n B/B ln p 2
A B
2
+ n B/O ln(2p B p O )+ n AB ln(2p Ap B )+ n O ln p (2.3)
O

n
+ln .
n A/A n A/O n B/B n B/O n AB n O
In the E step of the EM algorithm, we take the expectation of ln f(X | p)
conditional on the observed counts n A , n B , n AB , and n O and the current
t
parameter vector p m =(p mA ,p mB ,p mO ) . It is obvious that
E(n AB | Y, p m)= n AB
E(n O | Y, p m)= n O .
A moment’s reﬂection also yields
=E(n A/A | Y, p m)
n m,A/A
p 2 mA
= n A 2
p +2p mA p mO
mA
=E(n AO | Y, p m)
n m,A/O
2p mAp mO
= n A 2 .
p mA +2p mA p mO

38 39 40 41 42 43 44 45 46 47 48