Page 38 - Applied Probability
P. 38
2
Counting Methods and the EM
Algorithm
2.1 Introduction
In this chapter and the next, we undertake the study of estimation meth-
ods and their applications in genetics. Because of the complexity of genetic
models, geneticists by and large rely on maximum likelihood estimators
rather than on competing estimators derived from minimax, invariance, ro-
bustness, or Bayesian principles. A host of methods exists for numerically
computing maximum likelihood estimates. Some of the most appealing in-
volve simple counting arguments and the EM algorithm. Indeed, historically
geneticists devised many special cases of the EM algorithm before it was
generally formulated by Dempster et al. [5, 12]. Our initial example retraces
some of the steps in the long march from concrete problems to an abstract
algorithm applicable to an astonishing variety of statistical models.
2.2 Gene Counting
Suppose a geneticist takes a random sample from a population and observes
the phenotype of each individual in the sample at some autosomal locus.
How can the sample be used to estimate the frequency of an allele at the
locus? If all alleles are codominant, the answer is obvious. Simply count
the number of times the given allele appears in the sample, and divide by
the total number of genes in the sample. Remember that there are twice
as many genes as individuals.
TABLE 2.1. MN Blood Group Data
Phenotype Genotype Number
M M/M 119
MN M/N 76
N N/N 13
Example 2.2.1 Gene Frequencies for the MN Blood Group
The MN blood group has two codominant alleles M and N. Crow [4]
cites the data from Table 2.1 on 208 Bedouins of the Syrian desert. To