Page 45 - Applied Probability
P. 45

2. Counting Methods and the EM Algorithm
                              28
                              the investigator must correct for the ascertainment process. The simplest
                              ascertainment model postulates that the number of ascertained siblings fol-
                              lows a binomial distribution with success probability π and number of trials
                              equal to the number of affected siblings. In effect, families are ascertained
                              only through their affected siblings, and siblings come to the attention of
                              the genetic investigator independently, with common probability π per sib-
                              ling. The number of affecteds likewise follows a binomial distribution with
                              success probability p and number of trials equal to the number of siblings.
                              The EM algorithm can be employed to estimate p and π jointly. More
                              complicated and realistic ascertainment models are discussed in [14].
                                Suppose that the kth ascertained family has s k siblings, of whom r k are
                              affected and a k are ascertained. The numbers r k and a k constitute the
                              observed data Y k for the kth ascertained family. The missing data consist
                              of the number of at-risk families that were missed in the ascertainment
                              process and the corresponding statistics r k and a k = 0 for each of these
                              missing families. The likelihood of the observed data is

                                              s k  r k   s k −r k r k  a k  r k −a k
                                                p (1 − p)        π (1 − π)
                                              r k             a k              ,
                                                                  s
                                                        1 − (1 − pπ) k
                                           k
                              where the product extends only over the ascertained families. The denom-
                              inator 1 − (1 − pπ) s k  in this likelihood is the probability that a family with
                              s k siblings is ascertained.
                                These denominators disappear in the complete data likelihood

                                             s k               r k

                                                 p (1 − p) s k −r k  π (1 − π) r k −a k
                                                                    a k
                                                  r k
                                             r k               a k
                                          k
                              because we no longer condition on the event of ascertainment for each
                              family. This simplification is partially offset by the added complication
                              that the product now extends over both the ascertained families and the
                              at-risk unascertained families. If θ =(p, π), r mk =E(r k | Y k ,θ m ), and
                              a mk =E(a k | Y k ,θ m ), then the E step of the EM algorithm amounts to
                              forming

                                        Q(θ | θ m )  =   [r mk ln p +(s k − r mk )ln(1 − p)
                                                       k
                                                      + a mk ln π +(r mk − a mk )ln(1 − π)].

                              The M step requires solving the equations


                                                       r mk  s k − r mk
                                                           −            =0
                                                        p      1 − p
                                                   k

                                                     a mk  r mk − a mk
                                                         −              =0.
                                                      π       1 − π
                                                  k
   40   41   42   43   44   45   46   47   48   49   50