Page 252 -
P. 252
234 8 Mining Additional Perspectives
8.5 Decision Mining
The case perspective focuses on properties of cases. Each case is characterized by
its case attributes, the attributes of its events, the path taken, and performance infor-
mation (e.g., flow times).
First, we focus on the influence of case and event attributes on the routing of
cases. In Fig. 8.9 there are two decision points:
• After registering the request (activity a) either a thorough examination (activity b)
or a casual examination (activity c) follows.
• After making a decision (activity e), activity g (pay compensation), activity h
(reject request), or activity f (reinitiate request) follows.
Both decision points are of type XOR-split: precisely one of several alternatives is
chosen. Decision mining aims to find rules explaining such choices in terms of char-
acteristics of the case [79]. For example, by analyzing the event log used to discover
Fig. 8.9 one could find that customers from the southern region are always checked
thoroughly and that requests by silver customers always get rejected. Clearly, aclas-
sification technique like decision tree learning can be used to find such rules (see
Sect. 3.2). Recall that the input for decision tree learning is a table where every row
lists one categorical response variable (e.g., the chosen activity) and multiple pre-
dictor variables (e.g., properties of the customer). The decision tree aims to explain
the response variable in terms of the predictor variables.
Consider, for example, the situation shown in Fig. 8.14. Using three different
notations (YAWL, BPMN, and Petri nets) a choice is depicted: activity x is followed
by either activity y or activity z. The table in Fig. 8.14 shows different cases for
which this choice needs to be made. There are three predictor variables (type, region,
and amount) and one response variable (activity). Variables type, region, and activity
are categorical and variable amount is numerical. The predictor variables correspond
to knowledge known about the case at the point in time when the decision was made.
The response variable activity is determined based on a scan of the event log. The
event log will reveal whether x was followed by y or z. The table in Fig. 8.14 serves
as input for some decision tree learning algorithm as explained in Sect. 3.2.The
resulting decision tree can be rewritten into a rule. Based on the example table,
classification will show that the value of the response variable is y if the customer
is a gold customer and the amount is lower than € 500. Otherwise, the value of the
response variable is z as showninFig. 8.14.
Petri nets cannot express OR-splits and joins directly. However, in higher-level
languages like BPMN and YAWL one can express such behavior. Figure 8.15 shows
an OR-split using the YAWL and BPMN notation: activity x is followed by y,or z,
or y and z. Note that the response variable activity is still categorical and can be
determined by scanning the log. The table in Fig. 8.15 can be analyzed using a
decision tree learner and the result can be transformed into one rule for each of the
output arcs. The response variable is “just y” if the customer is a gold customer
and the amount is less than € 500, the response variable is “just z” if the customer
is a silver customer and the amount is at least € 500, and the response variable is