Page 43 - Applied Probability
P. 43

2. Counting Methods and the EM Algorithm
                              26
                                Thus, the difference ln g(y | θ) − Q(θ | θ n ) attains its minimum when
                              θ = θ n . If we choose θ n+1 to maximize Q(θ | θ n ), then it follows that
                                  ln g(y | θ n+1 )= Q(θ n+1 | θ n ) + [ln g(y | θ n+1 ) − Q(θ n+1 | θ n )]
                                                ≥ Q(θ n | θ n ) + [ln g(y | θ n ) − Q(θ n | θ n )]
                                                =ln g(y | θ n ),
                              with strict inequality when f(x | θ n+1 )/g(y | θ n+1 ) and f(x | θ n )/g(y | θ n )
                              are different conditional densities or when Q(θ n+1 | θ n ) >Q(θ n | θ n ). This
                              verifies the promised ascent property ln g(y | θ n+1 ) ≥ ln g(y | θ n )ofthe
                              EM algorithm.
                              2.5 Allele Frequency Estimation by the EM
                                     Algorithm

                              Let us return to the ABO example and formalize gene counting as an EM
                              algorithm. The observed numbers of people in each of the four phenotypic
                              categories constitute the observed data Y , while the unknown numbers of
                              people in each of the six genotypic categories constitute the complete data
                              X. Let n A/A be the number of people of genotype A/A. Define n A/O , n B/B ,
                              and n B/O similarly and set n = n A + n B + n AB + n O . Note that the n AB
                              people of phenotype AB and the n O people of phenotype O are already
                              correctly assigned to their respective genotypes A/B and O/O. With this
                              notation the complete data loglikelihood becomes
                                                         2
                                   ln f(X | p)= n A/A ln p + n A/O ln(2p A p O )+ n B/B ln p 2
                                                         A                           B
                                                                                         2
                                                 + n B/O ln(2p B p O )+ n AB ln(2p Ap B )+ n O ln p (2.3)
                                                                                         O

                                                                    n
                                                 +ln                                .
                                                       n A/A n A/O n B/B n B/O n AB n O
                                In the E step of the EM algorithm, we take the expectation of ln f(X | p)
                              conditional on the observed counts n A , n B , n AB , and n O and the current
                                                                  t
                              parameter vector p m =(p mA ,p mB ,p mO ) . It is obvious that
                                                   E(n AB | Y, p m)= n AB
                                                     E(n O | Y, p m)= n O .
                              A moment’s reflection also yields
                                                        =E(n A/A | Y, p m)
                                                n m,A/A
                                                                    p 2 mA
                                                        = n A  2
                                                              p    +2p mA p mO
                                                               mA
                                                        =E(n AO | Y, p m)
                                                n m,A/O
                                                                 2p mAp mO
                                                        = n A  2             .
                                                              p mA  +2p mA p mO
   38   39   40   41   42   43   44   45   46   47   48