Page 173 - Macromolecular Crystallography
P. 173

162  MACROMOLECULAR CRYS TALLOGRAPHY

        refinement programs use stereochemical parameters  task to minimize the value of R free, instead of
        derived from the Engh and Huber (1991) dictionary.  correctly obtaining the model that best describes
          Refinement is thus the optimization of the  the data. One may decide to run several model
        model parameters to simultaneously fit both the  refinement protocols in parallel and choose the
        experimental diffraction observations and a set  one with the lowest R free factor. However, it
        of a priori known stereochemical information. It  is important to understand that from this point
        should become clear from the formula above, that  onwards the free factor becomes biased to the chosen
        solvent content crucially affects the observation-to-  protocol.
        parameter ratio. Since crystals that diffract to lower  A misconception, which still occurs in the litera-
        resolution typically have higher solvent content,  ture, is that a cross-validation could serve as a local
        refinement can often be carried out decently well  indicator rather than a global guide for the whole
        even at nominally low resolution.            structure determination protocol. Cross-validation
          The crystallographic R factor, defined as   cannot be sufficiently sensitive to indicate as to
                                                     whether a particular side chain could be built into

                     obs  calc
                hkl   F hkl  − F hkl                 one or another conformation.
           R =
                       obs                            In using cross-validation it is essential to avoid,
                      F
                    hkl hkl
                                                     or at least minimize, bias to the free R factor itself.
        is used to measure the fit between the model and  In the era of emerging automated procedures for
        the observed data and to monitor the refinement  modelling and refinement a frequent mistake is to
        progress. In addition to the standard crystallo-  set aside a fraction of reflections for minimization
        graphic R factor, the so-called free R value can  of the residual in reciprocal space and, at the same
        be used as a cross-validating indicator to monitor  time, to use all data for computation of electron den-
        the overall progress and to avoid fitting to noise  sity and model rebuilding. Since local adjustment of
        (Brunger, 1992). The free R factor is, in principle, the  the model in real space is equivalent to global phase
        same as the standard R value, but only for a small  adjustment in reciprocal space, the free reflection set
        subset of the data that are never used throughout  becomes biased towards the current model and loses
        the refinement process and can therefore be taken  its validation credibility.
        as an independent evaluation of quality of fit as the  From a totally different perspective, omitting
        model has not been influenced by this set of inten-  the free set of reflections from the experimental
        sities; a technique that falls in the general category  observations effectively reduces the observation-to-
        of bootstrapping methods for validation. Full cross-  parameter ratio and can adversely affect the refine-
        validation by bootstrapping would require that all  ment. Clearly, an optimization problem crucially
        data in turn are tested, that is a model is refined first  depends on the number of observations that are used
        with, for example, 5% of data excluded for valida-  to find optimal values for the model parameters. If
        tion, the refinement repeated several times from the  done by a crystallographer, the result may depend
        beginning using different subsets of ‘free’ data. In  on the human creativity and skills combined with
        practice, however, this would not be efficient and  wishful thinking. The danger of over-interpretation
        require huge CPU costs for thorough validation; a  in normally not a caveat of automated model build-
        singe ‘free’ set is in general sufficient for crystal-  ing algorithms and cross-validation may not be
        lographic refinement and is typically employed in  necessary since the growing polypeptide chains of
        structure determination.                     traced residues can themselves be seen as an inde-
          The use of cross-validation should be widely  pendent cross-validation criterion. Based on one
        recommended for crystallographic model refine-  of the authors’ subjective experience (i.e. not fully
        ment. Its simplicity to understand for the non-  cross-validated)itcanbeadvisedthatwhenanobjec-
        expert and its power in discriminating models that  tive model building software package is used in par-
        are consistent with the experimental data make it  allel with maximum likelihood refinement, it may be
        indispensable. However, in modern crystallogra-  of advantage to not set aside the free set for valida-
        phy model refinement is often falsely seen as the  tion but instead use all the data. This can diminish
   168   169   170   171   172   173   174   175   176   177   178