Page 50 - Applied Probability
P. 50

2. Counting Methods and the EM Algorithm
                                 2. Color blindness is an X-linked recessive trait. Suppose that in a ran-
                                   dom sample there are f B normal females, f b color-blind females, m B
                                   normal males, and m b color-blind males. If n =2f B +2f b +m B +m b
                                   is the number of genes in the sample, then show that under Hardy-
                                   Weinberg equilibrium the maximum likelihood estimate of the fre-
                                   quency of the color-blindness allele is                   33

                                                                  2
                                                        −m B +   m +4n(m b +2f b)
                                                                  B
                                                    =                             .
                                                ˆ p b
                                                                    2n
                                   Compute the estimate ˆ p b = .0772 for the data f B = 9032, f b = 40,
                                   m B = 8324, and m b = 725. These data represent an amalgamation
                                   of cases from two distinct forms of color blindness [4]. Protanopia, or
                                   red blindness, is determined by one X-linked locus, and deuteranopia,
                                   or green blindness, by a different X-linked locus.
                                 3. Consider a codominant, autosomal locus with k alleles. In a random
                                   sample of n people, let n i be the number of genes of allele i. Show that
                                   the gene-counting estimates ˆ p i = n i /(2n) are maximum likelihood
                                   estimates.
                                 4. In forensic applications of DNA fingerprinting, match probabilities
                                    2
                                   p for homozygotes and 2p i p j for heterozygotes are computed [1]. In
                                    i
                                   practice, the frequencies p i can only be estimated. Assuming codom-
                                   inant alleles and the estimates ˆ p i = n i /(2n) given in the previous
                                   problem, show that the natural match probability estimates satisfy
                                                               p i (1 − p i )
                                                           2
                                                   2
                                                E(ˆ p )  = p +
                                                   i
                                                           i
                                                                  2n
                                                             3
                                                           4p (1 − p i )     1
                                                             i
                                                   2
                                              Var(ˆ p )  =           + O
                                                   i                       2
                                                              2n          n
                                                                  2p i p j
                                             E(2ˆ p i ˆ p j )  = 2p i p j −
                                                                   2n
                                                                                    1
                                                           4p i p j
                                            Var(2ˆ p i ˆ p j )  =  [p i + p j − 4p i p j ]+ O  .
                                                            2n                     n 2
                                                                                         s i 2n
                                   (Hint: The n i have joint moment-generating function (  p i e ) .)
                                                                                     i
                                 5. Consider two loci in Hardy-Weinberg equilibrium, but possibly not
                                   in linkage equilibrium. Devise an EM algorithm for estimating the
                                   gamete frequencies p AB , p Ab , p aB , and p ab , where A and a are the
                                   two alleles at the first locus and B and b are the two alleles at the
                                   second locus [17]. In a random sample of n individuals, let n AABB
                                   denote the observed number of individuals of genotype A/A at the
                                   first locus and of genotype B/B at the second locus. Denote the eight
                                   additional observed double-genotype frequencies similarly. The only
                                   one of these observed numbers entailing any ambiguity is n AaBb ; for
   45   46   47   48   49   50   51   52   53   54   55