Page 44 - Applied Probability
P. 44
2. Counting Methods and the EM Algorithm
The conditional expectations n m,B/B and n m,B/O are given by similar ex-
pressions.
The M step of the EM algorithm maximizes the Q(p | p m ) function de-
rived from (2.3) by replacing n A/A by n m,A/A , and so forth. Maximization
of Q(p | p m ) can be accomplished by introducing a Lagrange multiplier and
finding a stationary point of the unconstrained function 27
H(p, λ)= Q(p | p m )+ λ(p A + p B + p O − 1).
Setting the partial derivatives
∂ 2n m,A/A n m,A/O n AB
H(p, λ)= + + + λ
∂p A p A p A p A
∂ 2n m,B/B n m,B/O n AB
H(p, λ)= + + + λ
∂p B p B p B p B
∂ n m,A/O n m,B/O 2n O
H(p, λ)= + + + λ
∂p O p O p O p O
∂
H(p, λ)= p A + p B + p O − 1
∂λ
equal to 0 provides the unique stationary point of H(p, λ). The solution of
the resulting equations is
2n m,A/A + n m,A/O + n AB
p m+1,A =
2n
2n m,B/B + n m,B/O + n AB
=
p m+1,B
2n
n m,A/O + n m,B/O +2n O
= .
p m+1,O
2n
In other words, the EM update is identical to gene counting.
2.6 Classical Segregation Analysis by the EM
Algorithm
Classical segregation analysis is used to test Mendelian segregation ratios
in nuclear family data. A nuclear family consists of two parents and their
common offspring. Usually the hypothesis of interest is that some rare dis-
ease shows an autosomal recessive or an autosomal dominant pattern of
inheritance. Because the disease is rare, it is inefficient to collect families
at random. Only families with at least one affected sibling enter a typical
study. The families who come to the attention of an investigator are said
to be ascertained. To test the Mendelian segregation ratio p = 1 2 for an
autosomal dominant disease or p = 1 for an autosomal recessive disease,
4