Page 142 - Chiral Separation Techniques
P. 142

4.7 Decision Tree using Application of Machine Learning  119

             complex task of CSP classification. Indeed, the aim was basically to compare CSP
             applications through simple and easily interpretable similarity maps for simplifying
             the analysis of large data matrices. From a practical point of view, it is impossible to
             test all the existing CSPs. The comparison of maps allows a direct classification of
             whether a given CSP presents a broad variety of applications or not. For screening
             purposes, a CSP choice made throughout such studies should be much better than a
             random selection of CSPs.
               Furthermore, this approach can also supply a straightforward procedure to predict
             the potentialities of newly designed CSPs. Also, similarity maps can serve to depict
             resemblance between CSPs when there is no information available regarding the
             structural requirements for interaction with CSP. Compared to other methods such
             as hierarchical clustering approaches using structure-based fingerprints, our
             approach requires much less CPU time (less than 1 h to build a map of 250 000 dots).
             Thus, this rapid diversity analysis process may be proven useful in other areas, such
             as aiding in investigating diversity in databases of high-throughput screening results.




             4.7 Decision Tree using Application of Machine Learning


             Machine learning provides the easiest approach to data mining, and also provides
             solutions in many fields of chemistry: quality control in analytical chemistry [31],
             interpretation of mass spectra [32], as well prediction of pharmaceutical properties
             [33, 34] or drug design [35].
               Utilization of intelligent systems in chiral chromatography starts with an original
             project called “CHIRULE” developed by Stauffer and Dessy [36], who combined
             similarity searching and an expert system application for CSP prediction. This issue
             has recently been reconsidered by Bryant and co-workers with the first development
             of an expert system for the choice of Pirkle-type CSPs [37].
               Machine learning can analyze a large dataset and determine what information is
             most pertinent. Such generalized information can then be converted into knowledge
             through the generation of rule sets that will enable faster and more relevant deci-
             sions.
               A decision tree is constituted of two types of nodes: parent and leaves. Each par-
             ent node corresponds to a question or an attribute; each leaf node designates a sin-
             gle class. The branches connected to a parent node correspond to a split of the pop-
             ulation node according to the answers to the question or the value of the attribute.
             Each subset of the population is split again, recursively, using different questions or
             attributes until a subset belong to a single class. In this case, the branch of the tree
             stops with a leaf node labeled with a single class.
               A tree is read from root to leaves. We begin at the root of the tree which contains
             all the population. Then, following the relevant branches according to the question
             asked at each branch node, we finally reach a leaf node. The label on that leaf node
             provides the class which is the resulting conclusion induced from the tree.
   137   138   139   140   141   142   143   144   145   146   147