Page 170 - Macromolecular Crystallography
P. 170
MODEL BUILDING, REFINEMENT, AND VALIDATION 159
A better alternative is often the conjugate gradi- 11.2.2 A brief introduction to
ent method. Newton’s method uses gradient and pattern recognition
Hessian information to compute the next point, in
effect approximating the objective function at each Many tasks in the crystallographic structure solution
point by a quadratic function and finding the min- process may be seen as pattern recognition prob-
imum of this function. A drawback is that the lems, especially the areas of model building and
Hessian may not be easy to compute and even validation do little more than just pattern match-
more so the inversion thereof which is needed for ing. The objective of pattern recognition is to assign
the step computation. The matrix inversion can objects, which are described by a collection of numer-
be an unstable and computationally very expen- ical features (a feature vector), to the class to which
sive operation. In this context quasi-Newton, Gauss– they correspond. For example, a feature vector
Newton and Levenberg–Marquardt methods and that assigns a quality ‘wine’ to different quality
especially the Broyden–Fletcher–Goldfarb–Shanno classes may consist of features such as grape vari-
method should be mentioned. The interested reader ety, year, location, colour, alcohol content, nose,
is advised to consult specialized literature for fur- balance, acidity, sediment, container type (bottle or
ther details, or Tronrud (2004) for a crystallographic tetrapak), etc. Some of these features can be very
overview. In particular the Numerical Recipes books useful in discriminating between the classes (such
(Press et al., 2002) provide a good introduction to as year and location in this example) while others
many important approaches with implementation may not have such a high discriminative power (such
details. as alcohol content perhaps). Often features can be
There are methods that deliberately avoid the combined to provide new features that contain a
use of gradient and Hessian information. Such similar power of discrimination (in this example,
approaches typically require many more iterations all these features could be approximated by the
but can nevertheless save overall on computation. price tag).
Some popular ones are the Simplex Method, Genetic Pattern recognition can be formulated as an opti-
Algorithms, Simulated Annealing, Particle Swarm and mization problem that minimizes the risk of mak-
Ant Colony Optimization, and variants thereof. ing a wrong class assignment. Feature selection
Many open-source, freely available and commer- can be formulated as an optimization problem that
cial packages exist for handling many kinds of prob- aims at maximizing the discriminative power, often
lems; especially the statistics package R (Dalgaard, subject to further constraints, such as minimizing
2002) is recommended as a good starting point. It is the dimensionality and therefore casting most of a
important to choose the best method for the problem population’s variance on as fewer features as pos-
at hand, that is to match the optimization approach sible. Such combinations of features for the task of
to the nature of the function, the variables, and the classificationareparticularlywellexploredbynewly
boundary conditions. There are many features to emerging computational techniques, such as neu-
consider in choosing the method. Is the function ral networks (e.g. Bishop, 2002). They are generally
linear, or not? Are the variables real or integer (or based on the principle of a multilayer perceptron.
binary)? Are derivatives available? Is it a global or Networks with already as few as three layers and
local optimization? What accuracy is required? Etc. sigmoidal activation functions can approximate any
In macromolecular crystallography the optimiza- smooth mapping to an arbitrary accuracy (Lapedes
tion problems are too complex so that hardly any and Farber, 1988). An important feature of neural
existing package could be used as a black box networks is that they do not necessarily contain feed-
without considerable development and specifica- back loops and that their outputs can be calculated
tion. Below we outline how some basic crystal- as explicit functions of the inputs.
lographic problems in refinement and modelling Inordertoachievethegoalofclassification, agood
are addressed starting from a foundation of pattern classifier is needed. Supervised learning is a popular
recognition. approach that may be used to train the classifier to