Page 173 - Macromolecular Crystallography
P. 173
162 MACROMOLECULAR CRYS TALLOGRAPHY
refinement programs use stereochemical parameters task to minimize the value of R free, instead of
derived from the Engh and Huber (1991) dictionary. correctly obtaining the model that best describes
Refinement is thus the optimization of the the data. One may decide to run several model
model parameters to simultaneously fit both the refinement protocols in parallel and choose the
experimental diffraction observations and a set one with the lowest R free factor. However, it
of a priori known stereochemical information. It is important to understand that from this point
should become clear from the formula above, that onwards the free factor becomes biased to the chosen
solvent content crucially affects the observation-to- protocol.
parameter ratio. Since crystals that diffract to lower A misconception, which still occurs in the litera-
resolution typically have higher solvent content, ture, is that a cross-validation could serve as a local
refinement can often be carried out decently well indicator rather than a global guide for the whole
even at nominally low resolution. structure determination protocol. Cross-validation
The crystallographic R factor, defined as cannot be sufficiently sensitive to indicate as to
whether a particular side chain could be built into
obs calc
hkl F hkl − F hkl one or another conformation.
R =
obs In using cross-validation it is essential to avoid,
F
hkl hkl
or at least minimize, bias to the free R factor itself.
is used to measure the fit between the model and In the era of emerging automated procedures for
the observed data and to monitor the refinement modelling and refinement a frequent mistake is to
progress. In addition to the standard crystallo- set aside a fraction of reflections for minimization
graphic R factor, the so-called free R value can of the residual in reciprocal space and, at the same
be used as a cross-validating indicator to monitor time, to use all data for computation of electron den-
the overall progress and to avoid fitting to noise sity and model rebuilding. Since local adjustment of
(Brunger, 1992). The free R factor is, in principle, the the model in real space is equivalent to global phase
same as the standard R value, but only for a small adjustment in reciprocal space, the free reflection set
subset of the data that are never used throughout becomes biased towards the current model and loses
the refinement process and can therefore be taken its validation credibility.
as an independent evaluation of quality of fit as the From a totally different perspective, omitting
model has not been influenced by this set of inten- the free set of reflections from the experimental
sities; a technique that falls in the general category observations effectively reduces the observation-to-
of bootstrapping methods for validation. Full cross- parameter ratio and can adversely affect the refine-
validation by bootstrapping would require that all ment. Clearly, an optimization problem crucially
data in turn are tested, that is a model is refined first depends on the number of observations that are used
with, for example, 5% of data excluded for valida- to find optimal values for the model parameters. If
tion, the refinement repeated several times from the done by a crystallographer, the result may depend
beginning using different subsets of ‘free’ data. In on the human creativity and skills combined with
practice, however, this would not be efficient and wishful thinking. The danger of over-interpretation
require huge CPU costs for thorough validation; a in normally not a caveat of automated model build-
singe ‘free’ set is in general sufficient for crystal- ing algorithms and cross-validation may not be
lographic refinement and is typically employed in necessary since the growing polypeptide chains of
structure determination. traced residues can themselves be seen as an inde-
The use of cross-validation should be widely pendent cross-validation criterion. Based on one
recommended for crystallographic model refine- of the authors’ subjective experience (i.e. not fully
ment. Its simplicity to understand for the non- cross-validated)itcanbeadvisedthatwhenanobjec-
expert and its power in discriminating models that tive model building software package is used in par-
are consistent with the experimental data make it allel with maximum likelihood refinement, it may be
indispensable. However, in modern crystallogra- of advantage to not set aside the free set for valida-
phy model refinement is often falsely seen as the tion but instead use all the data. This can diminish