Page 50 - Applied Probability
P. 50
2. Counting Methods and the EM Algorithm
2. Color blindness is an X-linked recessive trait. Suppose that in a ran-
dom sample there are f B normal females, f b color-blind females, m B
normal males, and m b color-blind males. If n =2f B +2f b +m B +m b
is the number of genes in the sample, then show that under Hardy-
Weinberg equilibrium the maximum likelihood estimate of the fre-
quency of the color-blindness allele is 33
2
−m B + m +4n(m b +2f b)
B
= .
ˆ p b
2n
Compute the estimate ˆ p b = .0772 for the data f B = 9032, f b = 40,
m B = 8324, and m b = 725. These data represent an amalgamation
of cases from two distinct forms of color blindness [4]. Protanopia, or
red blindness, is determined by one X-linked locus, and deuteranopia,
or green blindness, by a different X-linked locus.
3. Consider a codominant, autosomal locus with k alleles. In a random
sample of n people, let n i be the number of genes of allele i. Show that
the gene-counting estimates ˆ p i = n i /(2n) are maximum likelihood
estimates.
4. In forensic applications of DNA fingerprinting, match probabilities
2
p for homozygotes and 2p i p j for heterozygotes are computed [1]. In
i
practice, the frequencies p i can only be estimated. Assuming codom-
inant alleles and the estimates ˆ p i = n i /(2n) given in the previous
problem, show that the natural match probability estimates satisfy
p i (1 − p i )
2
2
E(ˆ p ) = p +
i
i
2n
3
4p (1 − p i ) 1
i
2
Var(ˆ p ) = + O
i 2
2n n
2p i p j
E(2ˆ p i ˆ p j ) = 2p i p j −
2n
1
4p i p j
Var(2ˆ p i ˆ p j ) = [p i + p j − 4p i p j ]+ O .
2n n 2
s i 2n
(Hint: The n i have joint moment-generating function ( p i e ) .)
i
5. Consider two loci in Hardy-Weinberg equilibrium, but possibly not
in linkage equilibrium. Devise an EM algorithm for estimating the
gamete frequencies p AB , p Ab , p aB , and p ab , where A and a are the
two alleles at the first locus and B and b are the two alleles at the
second locus [17]. In a random sample of n individuals, let n AABB
denote the observed number of individuals of genotype A/A at the
first locus and of genotype B/B at the second locus. Denote the eight
additional observed double-genotype frequencies similarly. The only
one of these observed numbers entailing any ambiguity is n AaBb ; for