Page 73 - Applied Probability
P. 73

3. Newton’s Method and Scoring
                              56
                                   distribution. Use formula (3.13) and show that
                                                                      (n i + α i )
                                           2
                                        E(p | N 1 = n 1 ,...,N k = n k )=
                                           i
                                                                      (2n + α .)
                                                                      (n i + α i ) 2 2 4  
  (n i + α i ) 2    2
                                           2
                                      Var(p | N 1 = n 1 ,...,N k = n k )=       −
                                           i
                                                                      (2n + α .) 4  (2n + α . ) 2
                                                                       (n i + α i )(n j + α j )
                                     E(2p i p j | N 1 = n 1 ,...,N k = n k )= 2
                                                                           (2n + α . ) 2
                                                                               2
                                                                      4(n i + α i ) (n j + α j ) 2
                                    Var(2p i p j | N 1 = n 1 ,...,N k = n k )=
                                                                           (2n + α . ) 4
                                                                        
                   2
                                                                         2(n i + α i )(n j + α j )
                                                                      −                      ,
                                                                             (2n + α . ) 2
                                          r
                                   where x = x(x +1) ··· (x + r − 1) denotes a rising factorial power.
                                   It is interesting that the above mean expressions entail
                                                    2
                                                E(p | N 1 = n 1 ,...,N k = n k )  > p ˜ 2 i
                                                    i
                                              E(2p i p j | N 1 = n 1 ,...,N k = n k )  < 2˜ p i ˜ p j ,
                                   where ˜ p i and ˜ p j are the posterior means of p i and p j .
                                17. Problem 5 of Chapter 2 considers haplotype frequency estimation for
                                   two linked, biallelic loci. The EM algorithm discussed there relies on
                                   the allele-counting estimates p A , p a , p B , and p b .
                                    (a) Construct the Dirichlet prior from these estimates mentioned
                                        in Section 3.8 and devise an EM algorithm that maximizes the
                                        product of the prior and the likelihood of the observed data. In
                                        particular, show that the EM update for p AB is
                                                      2n AABB + n AABb + n AaBB + n mAB/ab + β AB
                                         p m+1,AB  =
                                                                       2n + β
                                                                  2p mAB p mab
                                         n mAB/ab  = n AaBb                       ,
                                                           2p mAB p mab +2p mAb p maB
                                        where β AB = α AB − 1 and β = α − 4. There are similar updates
                                        for p Ab , p aB , and p ab . (Hint: The log prior passes untouched
                                        through the conditional expectation of the E step of the EM
                                        algorithm.)
                                    (b) Implement this EM algorithm on the mosquito data given in
                                        Table 2.5 of Chapter 2 for the value α − 4 = 10 and starting
                                        from the estimated linkage equilibrium frequencies. You should
                                                p
                                        find that ˆ AB = .717, ˆp Ab = .083, ˆp aB = .121, and ˆp ab = .079.
                                    (c) Describe how you would generalize the algorithm to more than
                                        two loci and more than two alleles per locus.
   68   69   70   71   72   73   74   75   76   77   78