Page 51 - Applied Probability
P. 51

2. Counting Methods and the EM Algorithm
                              34
                                   individuals of this genotype, phase cannot be discerned. Now show
                                   that the EM update for p AB is
                                                       2n AABB + n AABb + n AaBB + n mAB/ab
                                                   =
                                          p m+1,AB
                                                                       2n
                                                                   2p mAB p mab
                                          n mAB/ab  = n AaBb                       .
                                                            2p mAB p mab +2p mAbp maB
                                   There are similar updates for p Ab , p aB , and p ab , but these can be
                                   dispensed with if one notes that for all m
                                                p A  = p mAB + p mAb
                                                p B  = p mAB + p maB
                                                  1= p mAB + p mAb + p maB + p mab ,
                                   where p A and p B are the gene-counting estimates of the frequencies
                                   of alleles A and B. Implement this EM algorithm on the mosquito
                                   data [17] given in Table 2.5. You should find that ˆ p AB = .73.
                                        TABLE 2.5. Mosquito Data at the Idh1 and Mdh Loci
                                              n AABB =19   n AABb =5   n AAbb =0
                                               n AaBB =8   n AaBb =8   n Aabb =0
                                               n aaBB =0   n aaBb =0   n aabb =0



                                 6. In a genetic linkage experiment, AB/ab animals are crossed to mea-
                                   sure the recombination fraction θ between two loci with alleles A and
                                   a at the first locus and alleles B and b at the second locus. In this
                                   design the dominant alleles A and B are in the coupling phase. Ver-
                                   ify that the offspring of an AB/ab × AB/ab mating fall into the four
                                   categories AB, Ab, aB, and ab with probabilities π 1 =  1  +  (1−θ) 2 ,
                                                                                     2     4
                                         1−(1−θ) 2    1−(1−θ) 2         (1−θ) 2
                                   π 2 =        , π 3 =      , and π 4 =     , respectively. Devise
                                            4            4                4
                                   an EM algorithm to estimate θ, and apply it to the counts
                                                  (y 1 ,y 2 ,y 3 ,y 4 )  =  (125, 18, 20, 34)

                                   observed on 197 offspring of such matings. You should find the max-
                                                          ˆ
                                   imum likelihood estimate θ = .2083 [11]. (Hints: Split the first cate-
                                   gory into two so that there are five categories for the complete data.
                                                                     2
                                   Reparameterize by setting φ =(1 − θ) .)
                                 7. In an inbred population, the inbreeding coefficient f is the probability
                                   that two genes of a random person at some locus are both copies of
                                   the same ancestral gene. Assume that there are k codominant alleles
                                   and that p i is the frequency of allele A i . Show that fp i +(1 − f)p 2 i
   46   47   48   49   50   51   52   53   54   55   56