Page 40 - Applied Probability
P. 40

2. Counting Methods and the EM Algorithm
                              iteration m
                                                                     2
                                                                    p
                                                                     mA
                                                n m,A/A
                              people to genotype A/A and  = n A  p 2 mA  +2p mAp mO          23
                                                                  2p mAp mO
                                                n m,A/O  = n A  2
                                                               p   +2p mAp mO
                                                                mA
                              people to genotype A/O. We now update p mA by
                                                         2n m,A/A + n m,A/O + n AB
                                             p m+1,A  =                         .          (2.2)
                                                                   2n
                              The update for p mB is the same as (2.2) except for the interchange of the
                              labels A and B. The update for p mO is equally intuitive and preserves
                              the counting requirement p mA + p mB + p mO = 1. This iterative proce-
                              dure continues until p mA, p mB , and p mO converge. Their converged values
                              p ∞A , p ∞B , and p ∞O provide allele frequency estimates. This gene-counting
                              algorithm [12] is a special case of the EM algorithm.

                              Example 2.2.2 Gene Frequencies for the ABO Blood Group
                                As a practical example, let n A = 186, n B = 38, n AB = 13, and n O = 284.
                              These are the types of 521 duodenal ulcer patients gathered by Clarke et
                              al. [2]. As an initial guess, take p 0A = .3, p 0B = .2, and p 0O = .5. The
                              gene-counting iterations can be done on a pocket calculator. It is evident
                              from Table 2.2 that convergence occurs quickly.


                                         TABLE 2.2. Iterations for ABO Duodenal Ulcer Data
                                               Iteration m   p mA   p mB   p mO
                                                    0       .3000  .2000  .5000
                                                    1       .2321  .0550  .7129
                                                    2       .2160  .0503  .7337
                                                    3       .2139  .0502  .7359
                                                    4       .2136  .0501  .7363
                                                    5       .2136  .0501  .7363





                              2.3 Description of the EM Algorithm


                              A sharp distinction is drawn in the EM algorithm between the observed,
                              incomplete data Y and the unobserved, complete data X of a statistical
   35   36   37   38   39   40   41   42   43   44   45