Page 143 - Chiral Separation Techniques
P. 143
120 4 CHIRBASE: Database Current Status and Derived Research Applications using …
The first tree induction algorithm is called ID3 (Iterative Dichotomizer version 3)
and was developed by Quinlan [38]. Subsequent improved versions of ID3 are C4.5
and C5. In our study, we used MC4 decision tree algorithm which is available in the
MLC++ package [39]. MC4 and C4.5 use the same algorithm with different default
parameter settings.
The purpose of this study is only intended to illustrate and evaluate the decision
tree approach for CSP prediction using as attributes the 166 molecular keys publicly
available in ISIS. This assay was carried out a CHIRBASE file of 3000 molecular
structures corresponding to a list of samples resolved with an α value superior to 1.8.
For each solute, we have picked in CHIRBASE the traded CSP providing the high-
est enantioselectivity. This procedure leads to a total selection of 18 CSPs commer-
cially available under the following names: Chiralpak AD [28], Chiral-AGP [40],
Chiralpak AS [28], Resolvosil BSA-7 [41], Chiral-CBH [40], CTA-I (microcrys-
talline cellulose triacetate) [42], Chirobiotic T [43], Crownpak CR(+) [28],
Cyclobond I [43], DNB-Leucine covalent [29], DNB-Phenylglycine covalent [29],
Chiralcel OB [28], Chiralcel OD [28], Chiralcel OJ [28], Chiralpak OT(+) [28],
Ultron-ES-OVM [44], Whelk-O 1 [29], (R,R)-β-Gem 1 [29].
After importing the data file into MLC++ and selecting “gain-ratio” as splitting
method, the program builds the full tree shown in Fig. 4-16. The tree has 631 nodes,
316 leaves and 107 attributes. Attributes are molecular key features and leaves are
CSPs.
Fig. 4-16. Decision tree built by MLC++ from the analysis of 3000 solutes resolved on 18 commer-
cially available CSPs. The magnifying glass shows the region zoomed in Fig. 4-17.