Page 317 - Applied Probability
P. 317

14. Poisson Approximation
                              306
                              approximation
                                                              1
                                                              d
                                                   n
                                                      ≈
                                                              ln(1 − d)
                                                             1
                                                          1
                                                           ln .
                                                      ≈   − ln( + 1) + ln ln 2
                                                          d  d
                              In fact, a detailed analysis shows that the average required number of
                              markers is asymptotically similar to  1 d  ln  1 d  for d small [8, 18]. The factor
                              ln  1  is the penalty exacted for randomly selecting markers.
                                 d
                                The tedium of filling the last few gaps also plagues other mapping en-
                              deavors such as covering a chromosome by random clones of fixed length
                              d [20]. If we let the center of each clone correspond to a marker, then ex-
                              cept for edge effects, this problem is completely analogous to the marker
                              coverage problem.
                              14.6 Randomness of Restriction Sites
                              Restriction enzymes are special bacterial proteins that snip DNA. The
                              restriction sites where the cutting takes place vary from enzyme to en-
                              zyme. For instance, the restriction enzyme EcoRI recognizes the six-base
                              sequence GAATTC and snips DNA wherever this sequence appears. The
                              restriction enzyme NotI recognizes the rarer eight-base sequence GCGGC-
                              CGC and consequently tends to produce much longer fragments on average
                              than EcoRI. To a good approximation, the restriction sites for a particular
                              enzyme occur along a chromosome according to a homogeneous Poisson
                              process. Clustering of restriction sites is a particularly interesting violation
                              of the Poisson process assumptions.
                                If one visualizes n restriction sites along a stretch of DNA as random
                              points on the unit interval [0, 1], then under the Poisson process assump-
                              tion, the n points should constitute a random sample of size n from the
                              uniform distribution on [0, 1]. The distances between adjacent points are
                              known as spacings,or scans.An m-spacing is the distance between the
                              first and last point of m + 1 adjacent points. In Section 14.5, we approxi-
                              mated the distribution of the largest 1-spacing. Here we are interested in
                              detecting clustering by examining the smallest m-spacing S m from a set
                              of n restriction sites. Values of m> 1 are important because very short
                              DNA fragments are difficult to measure exactly. The Chen-Stein method
                              provides a means of assessing the significance of an observed m-spacing
                              S m = s [5, 13].
                                Consider the collection I of subsets α of size m + 1 from the set of n
                              random points on [0, 1]. Let X α be the indicator random variable of the
                              event that the distance from the first point of α to the last point of α is
                                                                    n
                              less than or equal to s. There are |I| =  such collections α, and each
                                                                  m+1
   312   313   314   315   316   317   318   319   320   321   322