Page 207 - Applied Probability
P. 207

9. Descent Graph Methods
                              192
                              simplest error model posits a uniform distribution of errors over the avail-
                              able genotypes at a single locus. Empirically, this penetrance model appears
                              capable of detecting most typing errors. In reading the bands of a gel, er-
                              rors are not distributed uniformly over all genotypes but tend to cluster.
                              For example, with tandem repeat loci, a repeat allele is ordinarily confused
                              with a neighboring allele with one more or one less repeat. This kind of
                              error often causes heterozygotes to be mistyped as homozygotes. Taking a
                              uniform distribution over neighboring genotypes as defined by neighboring
                              alleles should model gel reading better. Computational speed would also be
                              enhanced by eliminating some genotypes as mistyping choices for a given
                              genotype. Of course, if geneticists adopt single nucleotide polymorphisms
                              and genotyping chips and discard tandem repeat markers and gels, then
                              the naive uniform model becomes more persuasive.
                                Regardless of the error model, all posterior error probabilities reduce to
                              simple conditional probabilities. Let M denote the collection of observed
                              genotypes in a pedigree and A ij the event that the true genotype and
                              observed genotype at locus j of person i match. The posterior probability
                              of no error at this locus and person is just the conditional probability
                              Pr(M ∩ A ij | M). Given the correct penetrance function implementing the
                              genotyping error model, one can approximate this conditional probability
                              stochastically as the proportion of time in the Markov chain simulation
                              that the true and observed genotypes match. This is one setting where it
                              is preferable to operate on descent states rather than descent graphs since
                              this change obviates the need for implementing a time-consuming backtrack
                              scheme to compute the likelihood of each encountered descent graph. If one
                              proceeds deterministically, it is easiest to evaluate Pr(M ∩A ij ) and Pr(M)
                              separately and divide. A trivial adjustment of the genotyping penetrance
                              function accounts for the difference between these probabilities. For small
                              pedigrees, it helps in the deterministic computations to reduce the set of
                              possible alleles at each locus to those actually seen in the pedigree. This
                              may change posterior probabilities slightly, but the decrease in computing
                              time easily justifies the shortcut.



                              9.13 Marker Sharing Statistics


                              Well-designed descent-graph statistics can readily capture excess identity-
                              by-descent sharing among the affected members of a disease pedigree. We
                              have already encountered one such statistic in Chapter 6. The current sta-
                              tistics are better because they exploit multiple linked markers and geno-
                              typing results on normal as well as affected members of a pedigree. In com-
                              puting these new statistics, descent graphs can be sampled exhaustively
                              on small pedigrees or stochastically on large pedigrees [19, 33]. Statistics
                              are scored by sliding a hypothetical trait locus across the marker map. At
   202   203   204   205   206   207   208   209   210   211   212