Page 171 - Macromolecular Crystallography
P. 171
160 MACROMOLECULAR CRYS TALLOGRAPHY
discriminate between the classes using the features 11.2.3 Crystallographic model building
from a training set. A training set is a set of objects
The aim of crystallographic model building is to con-
with already assigned classes. Supervised learning
struct a model that explains the experimental data
therefore implies the availability of prior knowledge
under the condition that it should make physical and
for construction of the training set. It is explicitly
chemical sense. To build a crystallographic macro-
known what the result of the classification should
molecular model the most crucial information is the
be. Concurrently, a test set is kept apart to validate
reconstructed electron density map. Interpretation of a
the classifier, while developing it, against data that
mapconsistsofexaminingtopologicalfeaturesofthe
have not been used for this training. Avalidation set is
density and using knowledge of the chemical nature
kept apart for the final testing of the developed algo-
ofmacromoleculestodeterminetheatomicpositions
rithms. In macromolecular crystallography having
and their connectivity. Before the development and
available a large set of structures and their exper-
availability of automated software approaches this
imental data, will provide an invaluable resource
was a cumbersome task that could easily take many
for supervised learning but also for general algo-
months and required experience in protein structure
rithm development. There is a wide collection of
and chemistry, and often a vivid imagination.
classifiers (or classifier functions) that can be used for
In order to construct an electron density map, into
supervised learning, depending on the nature of the
which the macromolecular model should be built,
features: neural networks (i.e. a multilayer percep-
both the amplitudes and the phases of the diffracted
tron); likelihood-based methods (including linear
X-rays need to be known. Since only the ampli-
and quadratic discriminators); nearest neighbour
tudes are directly attainable in a diffraction exper-
methods (i.e. a linear vector quantizer); and rule-
iment, obtaining phases (solving the so-called phase
based classifiers (such as binary trees and support
problem) plays a central role in crystallography.
vector machines). These methods are easily accessi-
The initial phase estimates, provided either by the
ble for testing through publicly available software
use of a homologous model – molecular replacement
(e.g. Lippmann et al., 1993). Any ensemble of clas-
(Turkenburg and Dodson, 1996) or by experimen-
sifiers can be linked to form a committee, where
tal techniques involving the use of heavy atoms
each classifier has a ‘vote’ and, if enough classi-
and synchrotron radiation, (M/S)IR(AS), (S/M)AD
fiers reach the same conclusion, this result is taken.
(Ogata 1998), are often of rather poor quality and
In many practical implementations it is preferred
result in inadequate electron density maps. The
that each classifier is given a smooth weight (rather
model, which can be built into such maps may be
than a binary ‘voting’) so that the ‘votes’ of all
incomplete and even, in parts, incorrect. It may
committee ‘members’ are used to the extent of their
therefore require rounds of extensive refinement
reliability.
combined with model rebuilding. Visual examina-
As in the above example in which the ‘wine qual-
tion of the electron density and manual adjustment
ity’ is a somewhat ill-defined concept subject to indi-
of the current model is a tedious, time demand-
vidual taste, many classification schemes are often
ing, and subjective heavily step that relies on user
heavily biased by the viewpoint of the researcher
experience. It has been recently eased by automated
(and this can influence the performance). Unsuper-
methods, and these and other related developments
vised learning (e.g. Ritter et al., 1992) largely avoids
are discussed below.
this bias but at the cost of often less powerful meth-
ods and the missing interpretation of the arising
classes, where often it is not obvious what these
11.2.4 Crystallographic refinement
classes represent.
Similar to optimization, the field of pattern recog- Macromolecular crystallographic refinement is an
nition is well developed with many high quality example of a restrained optimization problem.
textbooks (e.g. Theodoridis and Koutroumbas, 2006; Standard refinement programs adjust the atomic
Duda et al., 2000) and some excellent and user- positions and, typically, also their atomic dis-
friendly software packages. placement parameters of a given model with the