Page 157 - Macromolecular Crystallography
P. 157

146  MACROMOLECULAR CRYS TALLOGRAPHY

        structure a good guess of the correct histogram can  equations of type 3 and 6 are of a statistical nature
        be obtained, resulting in the following equation:  and therefore may be less restrictive. Furthermore,
                                                     inaccuracies in determining the solvent mask and
           H(ρ(x protein ))
                                                     the non-crystallographic symmetry operators and
                                            
                       2N                            masks will compromise the procedure. Neverthe-
                     1
             = H  obs      F(h j ) e                less, the additional information imposed often leads
                                iφ(h j )−2iπx protein ·h j 
                     V
                       j=0                           to substantial improvement.
                                               (6)    Unfortunately, the relations between the electron
        Here x protein is a real space coordinate within  density, the restraints we have discussed here, and
        the protein region, H(ρ(x)) is the expected, non-  the structure factors are non-linear. Thus, the only
        Gaussian histogram of the electron density and  strategy we can adopt is to use the approximate
        H obs (ρ(x)) is the observed histogram of protein  phases we start out with and improve these iter-
        density which may or may not have phase errors.  atively. Even this is not straightforward, mainly
          Equation 6 cannot be substituted into Eq. 1 and  because Eq. 1 is expensive to compute. However,
        therefore it does not further reduce the number  there exists a powerful and straightforward proce-
        of unknowns. However, it does provide addi-  dure that is used in virtually all phase refinement
        tional equations, their number being determined  programs: Fourier cycling.
        by the number of independent grid points within  In Fourier cycling, the approximate phases we
        the unique protein region. Its effectiveness is deter-  have available at the beginning of the process of
        mined by the difference between the theoretical  density modification are used to calculate an ini-
        histogram of a protein at a given resolution, and that  tial map. The real space restraints, solvent flatness,
        of randomly phased data.                     non-crystallographic symmetry averaging, and his-
                                                     togram matching, are imposed on this initial density
                                                     map. After Fourier transformation, the structure fac-
        10.5 The practice of phase refinement:
        Fourier cycling                              tors obtained typically no longer obey the reciprocal
                                                     space constraints such as the measured amplitude
        In theory, density modification could produce per-  and the phase probability distribution. Therefore
        fect phases, if the Eqs 2 to 6 are sufficiently restric-  existing reciprocal space restraints are recombined
        tive. Let us illustrate this by an example; assume  with the phase probability distribution obtained
        a crystal with 50% solvent and three-fold non-  after back transformation of the restrained electron
        crystallographic symmetry. There are 2N Fourier  density. These modified structure factors are used to
        summations (Eq. 1) with 4N unknowns. After sub-  calculate a new map. This new map may, in turn,
        stitution with Friedel’s Law (Eq. 2), only N phases  no longer obey the real space restraints, so these are
        remain unknown, so now there are 3N unknowns in  reimposed. The procedure is repeated until it con-
        total. For all phases we have experimental informa-  verges on a density map satisfying the equations
        tion encoded in Hendrickson–Lattman coefficients  available as well as is possible. In Fig. 10.1, a flow
        (Eq. 3), so we can add N equations to our set. As  chart of the process of Fourier cycling in shown.
        we know the location of the solvent region we can  Before we can understand why Fourier cycling
        reduce the number of unknown densities at ind-  works, we have to deepen our understanding of
        ependent grid points from 2N to N upon substi-  the Fourier transform. In particular, we need to
        tuting with Eq. 4. Non-crystallographic symmetry  understand the effects of modifying density on the
        further reduces the number of unknown densities to  structure factor amplitudes and phases. The math-
        N/3 by substituting with Eq. 5. Histogram matching  ematical tool that describes this is the convolution
        can further reduce the search space of solutions and  operator.
        improve convergence.                          Convolution is a commonly used mathematical
          If there are more equations than unknowns, why  technique that takes as input two functions, say A(x)
        then can we not determine phases accurately with-  and B(x). To convolute A(x) with B(x), first take the
        out building an atomic model? Well, the additional  function A(x) and place it at the origin of the second
   152   153   154   155   156   157   158   159   160   161   162