Page 207 - Applied Probability
P. 207
9. Descent Graph Methods
192
simplest error model posits a uniform distribution of errors over the avail-
able genotypes at a single locus. Empirically, this penetrance model appears
capable of detecting most typing errors. In reading the bands of a gel, er-
rors are not distributed uniformly over all genotypes but tend to cluster.
For example, with tandem repeat loci, a repeat allele is ordinarily confused
with a neighboring allele with one more or one less repeat. This kind of
error often causes heterozygotes to be mistyped as homozygotes. Taking a
uniform distribution over neighboring genotypes as defined by neighboring
alleles should model gel reading better. Computational speed would also be
enhanced by eliminating some genotypes as mistyping choices for a given
genotype. Of course, if geneticists adopt single nucleotide polymorphisms
and genotyping chips and discard tandem repeat markers and gels, then
the naive uniform model becomes more persuasive.
Regardless of the error model, all posterior error probabilities reduce to
simple conditional probabilities. Let M denote the collection of observed
genotypes in a pedigree and A ij the event that the true genotype and
observed genotype at locus j of person i match. The posterior probability
of no error at this locus and person is just the conditional probability
Pr(M ∩ A ij | M). Given the correct penetrance function implementing the
genotyping error model, one can approximate this conditional probability
stochastically as the proportion of time in the Markov chain simulation
that the true and observed genotypes match. This is one setting where it
is preferable to operate on descent states rather than descent graphs since
this change obviates the need for implementing a time-consuming backtrack
scheme to compute the likelihood of each encountered descent graph. If one
proceeds deterministically, it is easiest to evaluate Pr(M ∩A ij ) and Pr(M)
separately and divide. A trivial adjustment of the genotyping penetrance
function accounts for the difference between these probabilities. For small
pedigrees, it helps in the deterministic computations to reduce the set of
possible alleles at each locus to those actually seen in the pedigree. This
may change posterior probabilities slightly, but the decrease in computing
time easily justifies the shortcut.
9.13 Marker Sharing Statistics
Well-designed descent-graph statistics can readily capture excess identity-
by-descent sharing among the affected members of a disease pedigree. We
have already encountered one such statistic in Chapter 6. The current sta-
tistics are better because they exploit multiple linked markers and geno-
typing results on normal as well as affected members of a pedigree. In com-
puting these new statistics, descent graphs can be sampled exhaustively
on small pedigrees or stochastically on large pedigrees [19, 33]. Statistics
are scored by sliding a hypothetical trait locus across the marker map. At