Page 136 - Chiral Separation Techniques
P. 136
4.6 Dealing with Molecular Similarity 113
4.6 Dealing with Molecular Similarity
Besides 3D structure database searches, molecular similarity is also widely used for
drug design by the pharmaceutical industry, as demonstrated by two recent reviews
[23, 24]. More particularly, 2D fingerprints used to calculate the 2D topological sim-
ilarity of molecules were found valid to quantify molecular diversity and thus man-
age the global diversity of structure databases [25]. In this section, we describe the
application of similarity measures, in order to determine some relationships between
CSPs by production of a molecular similarity matrix displayed as a dot plot. More
precisely, the molecular similarity calculations applied to CHIRBASE provides a
way of comparing the samples within a dataset, as well as comparing different
datasets using the two following methodologies:
1. Select a set of compounds resolved on a given CSP, calculate the similarity
indices between all possible molecule pairs, and then use these indices to build a
similarity matrix containing relevant information about the structural diversity
within the set of samples separated on this CSP.
2. Select two sets of compounds resolved on two different CSPs, calculate the sim-
ilarity indices between all possible molecule pairs of these two sets, and then use
these indices to build a similarity matrix containing relevant information about the
structural affinities of these two CSPs.
The similarity matrices are constructed by one in-house program developed inside
CHIRBASE using the application development kit of ISIS. They contain the simi-
larity coefficients as expressed by the Tanimoto method. In ISIS, the Tanimoto coef-
ficients are calculated from a set of binary descriptors or molecular keys coding the
structural fragments of the molecules.
These structural key descriptors incorporate a remarkable amount of pertinent
molecular arrangements covering each type of interaction involved in ligand-recep-
tor bindings [26]. Since every structure in a database is represented by one or more
of the 960 key codes available in ISIS, suppose that two molecules include respec-
tively A and B key codes, then the Tanimoto coefficient is given by:
I
AB
−
U
I
[ AB] [ AB]
In ISIS, the similarity value is ranging between 0 and 100. A similarity value of 0
means that the two molecules are totally dissimilar, whereas a value of 100 will be
obtained when the two molecules are 100 % identical. The matrices are called sim-
ilarity matrix by convention, as larger numbers indicate more similarity between
items. Dot plots of the matrix are produced by another in-house application devel-
oped with Visual Basic using the InovaGIS object library [27]. The pixels in the map
are color-coded by similarity coefficients, providing a visual representation of simil-
itudes among one or two sets of molecules. Such a representation is a simple but
very powerful means for quickly visualizing and finding trends in very large data