Page 202 - Macromolecular Crystallography
P. 202

CHAPTER 13

                       Electron density fitting and

                       structure validation



                       Mike Carson








        13.1 Introduction                            at much higher resolution than is typical for pro-
                                                     teins. Currently about 200,000 error-free organic
        TheHumanGenomeProjectwentthree-dimensional
                                                     compounds with conventional R factors less than
        in late 2000. ‘Structural genomics’ efforts will deter-
                                                     0.05 are available through the Cambridge Structural
        mine the structures of thousands of new proteins
                                                     Database (CSD) (Allen et al., 1979). These data have
        over the next decade. These initiatives seek to
                                                     been mined for conformational analysis, hydrogen
        streamline and automate every experimental and
                                                     bonding directionality, non-bonded packing inter-
        computational aspect of the structural determina-
                                                     action, and more, as recently reviewed (Allen and
        tion pipeline, with most of the steps involved cov-
                                                     Motherwell, 2002). The CSD provides an invalu-
        ered in previous chapters of this volume. At the end
                                                     able source of coordinate geometry for inhibitors
        of the pipeline, an atomic model is built and itera-
                                                     and cofactors, which should be trusted more than
        tively refined to best fit the observed data. The final
                                                     the energy minimized output of any modelling
        atomic model, after careful analysis, is deposited
                                                     program.
        in the Protein Data Bank, or PDB (Berman et al.,
                                                      A common feature of modelling and refinement
        2000). About 25,000 unique protein sequences are
                                                     programs is a dictionary of ideal residues derived
        currently in the PDB. High-throughput and con-
                                                     from the results of small-molecule crystallography.
        ventional methods will dramatically increase this
                                                     Ideal bond lengths and angles for the amino acid
        number and it is crucial that these new struc-
                                                     and nucleic acid building blocks of macromolecules
        tures be of the highest quality (Chandonia and
                                                     have been gathered from the CSD (Engh and Huber,
        Brenner, 2006).
                                                     1991). The atomic bond and angle parameters are
          This chapter will address software systems to
                                                     tightly constrained for macromolecular refinement
        interactively fit molecular models to electron den-
                                                     and may be regarded as fixed, with the only degrees
        sity maps and to analyse the resulting models. This
                                                     of freedom coming from torsional rotation about
        chapter is heavily biased toward proteins, but the
                                                     single bonds.
        programs can also build nucleic acid models. First a
                                                      The favoured dihedral angles for protein main
        brief review of molecular modelling and graphics is
                                                     chains were derived from energy considerations
        presented. Next, the best current and freely available
                                                     of steric clashes in peptides giving the well
        programs are discussed with respect to their perfor-
                                                     known Ramachandran plot (Ramachandran and
        mance on common tasks. Finally, some views on the
                                                     Sasisekharan, 1968). These phi/psi combinations
        future of such software are given.
                                                     characterize the elements of secondary structure.
                                                     Accuratemainchainmodelscanbeconstructedfrom
                                                     ‘spare parts’, that is short pieces of helices, sheets,
        13.2 Initial molecular models
                                                     turns, and random coils taken from highly refined
        Small molecule crystal structures solved through  structures, provided a series of C-alpha positions
        direct methods yield very accurate atomic positions  can be established from the electron density map
                                                                                           191
   197   198   199   200   201   202   203   204   205   206   207