Page 171 - Macromolecular Crystallography
P. 171

160  MACROMOLECULAR CRYS TALLOGRAPHY

        discriminate between the classes using the features  11.2.3 Crystallographic model building
        from a training set. A training set is a set of objects
                                                     The aim of crystallographic model building is to con-
        with already assigned classes. Supervised learning
                                                     struct a model that explains the experimental data
        therefore implies the availability of prior knowledge
                                                     under the condition that it should make physical and
        for construction of the training set. It is explicitly
                                                     chemical sense. To build a crystallographic macro-
        known what the result of the classification should
                                                     molecular model the most crucial information is the
        be. Concurrently, a test set is kept apart to validate
                                                     reconstructed electron density map. Interpretation of a
        the classifier, while developing it, against data that
                                                     mapconsistsofexaminingtopologicalfeaturesofthe
        have not been used for this training. Avalidation set is
                                                     density and using knowledge of the chemical nature
        kept apart for the final testing of the developed algo-
                                                     ofmacromoleculestodeterminetheatomicpositions
        rithms. In macromolecular crystallography having
                                                     and their connectivity. Before the development and
        available a large set of structures and their exper-
                                                     availability of automated software approaches this
        imental data, will provide an invaluable resource
                                                     was a cumbersome task that could easily take many
        for supervised learning but also for general algo-
                                                     months and required experience in protein structure
        rithm development. There is a wide collection of
                                                     and chemistry, and often a vivid imagination.
        classifiers (or classifier functions) that can be used for
                                                      In order to construct an electron density map, into
        supervised learning, depending on the nature of the
                                                     which the macromolecular model should be built,
        features: neural networks (i.e. a multilayer percep-
                                                     both the amplitudes and the phases of the diffracted
        tron); likelihood-based methods (including linear
                                                     X-rays need to be known. Since only the ampli-
        and quadratic discriminators); nearest neighbour
                                                     tudes are directly attainable in a diffraction exper-
        methods (i.e. a linear vector quantizer); and rule-
                                                     iment, obtaining phases (solving the so-called phase
        based classifiers (such as binary trees and support
                                                     problem) plays a central role in crystallography.
        vector machines). These methods are easily accessi-
                                                     The initial phase estimates, provided either by the
        ble for testing through publicly available software
                                                     use of a homologous model – molecular replacement
        (e.g. Lippmann et al., 1993). Any ensemble of clas-
                                                     (Turkenburg and Dodson, 1996) or by experimen-
        sifiers can be linked to form a committee, where
                                                     tal techniques involving the use of heavy atoms
        each classifier has a ‘vote’ and, if enough classi-
                                                     and synchrotron radiation, (M/S)IR(AS), (S/M)AD
        fiers reach the same conclusion, this result is taken.
                                                     (Ogata 1998), are often of rather poor quality and
        In many practical implementations it is preferred
                                                     result in inadequate electron density maps. The
        that each classifier is given a smooth weight (rather
                                                     model, which can be built into such maps may be
        than a binary ‘voting’) so that the ‘votes’ of all
                                                     incomplete and even, in parts, incorrect. It may
        committee ‘members’ are used to the extent of their
                                                     therefore require rounds of extensive refinement
        reliability.
                                                     combined with model rebuilding. Visual examina-
          As in the above example in which the ‘wine qual-
                                                     tion of the electron density and manual adjustment
        ity’ is a somewhat ill-defined concept subject to indi-
                                                     of the current model is a tedious, time demand-
        vidual taste, many classification schemes are often
                                                     ing, and subjective heavily step that relies on user
        heavily biased by the viewpoint of the researcher
                                                     experience. It has been recently eased by automated
        (and this can influence the performance). Unsuper-
                                                     methods, and these and other related developments
        vised learning (e.g. Ritter et al., 1992) largely avoids
                                                     are discussed below.
        this bias but at the cost of often less powerful meth-
        ods and the missing interpretation of the arising
        classes, where often it is not obvious what these
                                                     11.2.4 Crystallographic refinement
        classes represent.
          Similar to optimization, the field of pattern recog-  Macromolecular crystallographic refinement is an
        nition is well developed with many high quality  example of a restrained optimization problem.
        textbooks (e.g. Theodoridis and Koutroumbas, 2006;  Standard refinement programs adjust the atomic
        Duda et al., 2000) and some excellent and user-  positions and, typically, also their atomic dis-
        friendly software packages.                  placement parameters of a given model with the
   166   167   168   169   170   171   172   173   174   175   176