Page 142 - Chiral Separation Techniques
P. 142
4.7 Decision Tree using Application of Machine Learning 119
complex task of CSP classification. Indeed, the aim was basically to compare CSP
applications through simple and easily interpretable similarity maps for simplifying
the analysis of large data matrices. From a practical point of view, it is impossible to
test all the existing CSPs. The comparison of maps allows a direct classification of
whether a given CSP presents a broad variety of applications or not. For screening
purposes, a CSP choice made throughout such studies should be much better than a
random selection of CSPs.
Furthermore, this approach can also supply a straightforward procedure to predict
the potentialities of newly designed CSPs. Also, similarity maps can serve to depict
resemblance between CSPs when there is no information available regarding the
structural requirements for interaction with CSP. Compared to other methods such
as hierarchical clustering approaches using structure-based fingerprints, our
approach requires much less CPU time (less than 1 h to build a map of 250 000 dots).
Thus, this rapid diversity analysis process may be proven useful in other areas, such
as aiding in investigating diversity in databases of high-throughput screening results.
4.7 Decision Tree using Application of Machine Learning
Machine learning provides the easiest approach to data mining, and also provides
solutions in many fields of chemistry: quality control in analytical chemistry [31],
interpretation of mass spectra [32], as well prediction of pharmaceutical properties
[33, 34] or drug design [35].
Utilization of intelligent systems in chiral chromatography starts with an original
project called “CHIRULE” developed by Stauffer and Dessy [36], who combined
similarity searching and an expert system application for CSP prediction. This issue
has recently been reconsidered by Bryant and co-workers with the first development
of an expert system for the choice of Pirkle-type CSPs [37].
Machine learning can analyze a large dataset and determine what information is
most pertinent. Such generalized information can then be converted into knowledge
through the generation of rule sets that will enable faster and more relevant deci-
sions.
A decision tree is constituted of two types of nodes: parent and leaves. Each par-
ent node corresponds to a question or an attribute; each leaf node designates a sin-
gle class. The branches connected to a parent node correspond to a split of the pop-
ulation node according to the answers to the question or the value of the attribute.
Each subset of the population is split again, recursively, using different questions or
attributes until a subset belong to a single class. In this case, the branch of the tree
stops with a leaf node labeled with a single class.
A tree is read from root to leaves. We begin at the root of the tree which contains
all the population. Then, following the relevant branches according to the question
asked at each branch node, we finally reach a leaf node. The label on that leaf node
provides the class which is the resulting conclusion induced from the tree.