Page 136 - Chiral Separation Techniques
P. 136

4.6 Dealing with Molecular Similarity  113

             4.6 Dealing with Molecular Similarity


             Besides 3D structure database searches, molecular similarity is also widely used for
             drug design by the pharmaceutical industry, as demonstrated by two recent reviews
             [23, 24]. More particularly, 2D fingerprints used to calculate the 2D topological sim-
             ilarity of molecules were found valid to quantify molecular diversity and thus man-
             age the global diversity of structure databases [25]. In this section, we describe the
             application of similarity measures, in order to determine some relationships between
             CSPs by production of a molecular similarity matrix displayed as a dot plot. More
             precisely, the molecular similarity calculations applied to CHIRBASE provides a
             way of comparing the samples within a dataset, as well as comparing different
             datasets using the two following methodologies:
             1. Select a set of compounds resolved on a given CSP, calculate the similarity
               indices between all possible molecule pairs, and then use these indices to build a
               similarity matrix containing relevant information about the structural diversity
               within the set of samples separated on this CSP.
             2. Select two sets of compounds resolved on two different CSPs, calculate the sim-
               ilarity indices between all possible molecule pairs of these two sets, and then use
               these indices to build a similarity matrix containing relevant information about the
               structural affinities of these two CSPs.
               The similarity matrices are constructed by one in-house program developed inside
             CHIRBASE using the application development kit of ISIS. They contain the simi-
             larity coefficients as expressed by the Tanimoto method. In ISIS, the Tanimoto coef-
             ficients are calculated from a set of binary descriptors or molecular keys coding the
             structural fragments of the molecules.
               These structural key descriptors incorporate a remarkable amount of pertinent
             molecular arrangements covering each type of interaction involved in ligand-recep-
             tor bindings [26]. Since every structure in a database is represented by one or more
             of the 960 key codes available in ISIS, suppose that two molecules include respec-
             tively A and B key codes, then the Tanimoto coefficient is given by:

                   I
                 AB
                   −
               U
                       I
               [ AB] [ AB]
               In ISIS, the similarity value is ranging between 0 and 100. A similarity value of 0
             means that the two molecules are totally dissimilar, whereas a value of 100 will be
             obtained when the two molecules are 100 % identical. The matrices are called sim-
             ilarity matrix by convention, as larger numbers indicate more similarity between
             items. Dot plots of the matrix are produced by another in-house application devel-
             oped with Visual Basic using the InovaGIS object library [27]. The pixels in the map
             are color-coded by similarity coefficients, providing a visual representation of simil-
             itudes among one or two sets of molecules. Such a representation is a simple but
             very powerful means for quickly visualizing and finding trends in very large data
   131   132   133   134   135   136   137   138   139   140   141