Page 170 - Macromolecular Crystallography
P. 170

MODEL BUILDING, REFINEMENT, AND VALIDATION  159

          A better alternative is often the conjugate gradi-  11.2.2 A brief introduction to
        ent method. Newton’s method uses gradient and  pattern recognition
        Hessian information to compute the next point, in
        effect approximating the objective function at each  Many tasks in the crystallographic structure solution
        point by a quadratic function and finding the min-  process may be seen as pattern recognition prob-
        imum of this function. A drawback is that the  lems, especially the areas of model building and
        Hessian may not be easy to compute and even  validation do little more than just pattern match-
        more so the inversion thereof which is needed for  ing. The objective of pattern recognition is to assign
        the step computation. The matrix inversion can  objects, which are described by a collection of numer-
        be an unstable and computationally very expen-  ical features (a feature vector), to the class to which
        sive operation. In this context quasi-Newton, Gauss–  they correspond. For example, a feature vector
        Newton and Levenberg–Marquardt methods and   that assigns a quality ‘wine’ to different quality
        especially the Broyden–Fletcher–Goldfarb–Shanno  classes may consist of features such as grape vari-
        method should be mentioned. The interested reader  ety, year, location, colour, alcohol content, nose,
        is advised to consult specialized literature for fur-  balance, acidity, sediment, container type (bottle or
        ther details, or Tronrud (2004) for a crystallographic  tetrapak), etc. Some of these features can be very
        overview. In particular the Numerical Recipes books  useful in discriminating between the classes (such
        (Press et al., 2002) provide a good introduction to  as year and location in this example) while others
        many important approaches with implementation  may not have such a high discriminative power (such
        details.                                     as alcohol content perhaps). Often features can be
          There are methods that deliberately avoid the  combined to provide new features that contain a
        use of gradient and Hessian information. Such  similar power of discrimination (in this example,
        approaches typically require many more iterations  all these features could be approximated by the
        but can nevertheless save overall on computation.  price tag).
        Some popular ones are the Simplex Method, Genetic  Pattern recognition can be formulated as an opti-
        Algorithms, Simulated Annealing, Particle Swarm and  mization problem that minimizes the risk of mak-
        Ant Colony Optimization, and variants thereof.  ing a wrong class assignment. Feature selection
          Many open-source, freely available and commer-  can be formulated as an optimization problem that
        cial packages exist for handling many kinds of prob-  aims at maximizing the discriminative power, often
        lems; especially the statistics package R (Dalgaard,  subject to further constraints, such as minimizing
        2002) is recommended as a good starting point. It is  the dimensionality and therefore casting most of a
        important to choose the best method for the problem  population’s variance on as fewer features as pos-
        at hand, that is to match the optimization approach  sible. Such combinations of features for the task of
        to the nature of the function, the variables, and the  classificationareparticularlywellexploredbynewly
        boundary conditions. There are many features to  emerging computational techniques, such as neu-
        consider in choosing the method. Is the function  ral networks (e.g. Bishop, 2002). They are generally
        linear, or not? Are the variables real or integer (or  based on the principle of a multilayer perceptron.
        binary)? Are derivatives available? Is it a global or  Networks with already as few as three layers and
        local optimization? What accuracy is required? Etc.  sigmoidal activation functions can approximate any
          In macromolecular crystallography the optimiza-  smooth mapping to an arbitrary accuracy (Lapedes
        tion problems are too complex so that hardly any  and Farber, 1988). An important feature of neural
        existing package could be used as a black box  networks is that they do not necessarily contain feed-
        without considerable development and specifica-  back loops and that their outputs can be calculated
        tion. Below we outline how some basic crystal-  as explicit functions of the inputs.
        lographic problems in refinement and modelling  Inordertoachievethegoalofclassification, agood
        are addressed starting from a foundation of pattern  classifier is needed. Supervised learning is a popular
        recognition.                                 approach that may be used to train the classifier to
   165   166   167   168   169   170   171   172   173   174   175