Page 222 - Applied Probability
P. 222
10. Molecular Phylogeny
208
Analogous simplifications hold when node j is a contemporary taxon and
when nodes i and j are both contemporary taxa.
For three contemporary taxa, it is trivial to verify that the parsimony
score s takes one of the three values 0, 1, or 2. If the observed bases of
the three taxa all agree, then s = 0; if two bases agree and one disagrees,
then s = 1; and if all three disagree, then s = 2. Since the three possible
rooted trees for three contemporary taxa collapse to the same unrooted
tree, these numerical conclusions are consistent with the fact that maximum
parsimony is actually a device for distinguishing unrooted trees. Problem
3 explores the more informative situation with four contemporary taxa.
Computation of the maximum parsimony score is best accomplished by
recursively traversing the nodes of the underlying evolutionary tree in a
postorder fashion [18]. In a recursive traversal, we view each internal node
as possessing a left and right subtree. In the maximum parsimony postorder
traversal, we first compute recursively the scores s i (b i ) for nodes in the left
subtree of the root, then we compute recursively the scores s i (b i ) for nodes
in the right subtree of the root, and finally we compute the scores at the
root. The recursive nature of postorder traversal guarantees that the we
visit the tips first and then move upward through the tree. In a preorder
traversal, we visit the root first, then its left subtree recursively, and finally
its right subtree recursively. Thus, postorder traversal is bottom-up, and
preorder traversal is top-down.
A preorder traversal is convenient in assigning a set of bases that realize
the maximum parsimony score. To assign the optimal bases, we first con-
struct a system of pointers in the postorder traversal. When we visit node
k, we connect base b k to the bases b i and b j at the daughter nodes i and
j that furnish the minimum of the right-hand side of equation (10.1). In
the case of a tie, we choose a pair (b i ,b j ) arbitrarily among the best pairs.
The preorder traversal commences by choosing an optimal base at the root.
We then follow the constructed pointers recursively down through the left
subtree and then recursively down through the right subtree. As we visit
each internal node, the appropriate pointer from its parent node identifies
a base compatible with the maximum parsimony score.
In spite of maximum parsimony’s speed and intuitive appeal, it can be
misleading in extreme cases. Felsenstein [5] points out that maximum par-
simony even fails the basic test of statistical consistency. In the remainder
of this chapter, we focus on building a parametric model for the evolution-
ary changes at a single DNA site and implementing maximum likelihood
estimation within the context of this model [6].