Page 184 - Applied Probability
P. 184

9
                              Descent Graph Methods
                              9.1 Introduction

                              Mapping disease and marker loci from pedigree phenotypes is one of the
                              most computationally onerous tasks in modern biology. Even tightly opti-
                              mized software can be quickly overwhelmed by the synergistic obstructions
                              of missing data, multiple marker loci, multiple alleles per marker locus,
                              and inbreeding. This unhappy situation has prompted mathematical and
                              statistical geneticists to formulate alternatives to the Elston-Stewart algo-
                              rithm. The most productive alternatives use elementary graph theory. The
                              Lander-Green-Kruglyak algorithm alluded to in Chapter 7 exploits gene
                              flow graphs and works well on small pedigrees. For large pedigrees, it is
                              helpful to combine the graph theory perspective with stochastic methods
                              of numerical integration [12, 23, 24, 32, 39, 40, 42].
                                One of the advantages of descent graph methods is that they enable us
                              to ask questions about genetic identity by descent. This has implications
                              for computing nonparametric linkage statistics and conditional kinship co-
                              efficients. Although the deterministic and stochastic algorithms share a
                              common core, they diverge in several details. The Lander-Green-Kruglyak
                              algorithm makes an interesting detour into Fourier analysis. The stochastic
                              methods can be viewed as part of the Markov chain Monte Carlo (MCMC)
                              revolution sweeping statistics. The Metropolis algorithm and Gibbs sam-
                              pling make it straightforward to construct a Markov chain sampling from
                              a complicated conditional distribution [7, 8, 10, 13, 28, 38]. Once a sam-
                              ple is available, then any conditional expectation can be approximated by
                              forming its corresponding sample average. The implications of this insight
                              are profound for both classical and Bayesian statistics. As a bonus, trivial
                              changes to the Metropolis algorithm yield simulated annealing, a general-
                              purpose algorithm for solving difficult combinatorial optimization problems
                              such as haplotyping [17, 31].
                                The agenda for this rather long chapter is to (a) review the existing
                              theory of finite-state Markov chains, (b) briefly explain the Metropolis al-
                              gorithm and simulated annealing, (c) apply these ideas to the computation
                              of location scores and the reconstruction of haplotypes, (d) develop the
                              Walsh transform and Baum’s algorithm standing behind the Lander-Green-
                              Kruglyak algorithm, and (e) show how these techniques fit in computing
                              nonparametric linkage statistics, conditional kinship coefficients, and error
                              probabilities in genotyping.
   179   180   181   182   183   184   185   186   187   188   189