Page 147 -
P. 147
5.2 A Simple Algorithm for Process Discovery 129
tion is related to the notion of overfitting. An overfitting model does not generalize
enough, i.e., it is too specific and too much driven by examples in the event log. The
fourth quality criterion is related to Occam’s Razor which states that “one should not
increase, beyond what is necessary, the number of entities required to explain any-
thing” (see Sect. 3.6.3). Following this principle, we look for the “simplest process
model” that can explain what is observed in the event log.
It turns out to be challenging to balance the four quality criteria. For instance, an
oversimplified model is likely to have a low fitness or lack of precision. Moreover,
there is an obvious trade-off between underfitting and overfitting. We discuss these
four quality criteria later in this chapter. However, we first introduce a concrete
process discovery algorithm.
5.2 A Simple Algorithm for Process Discovery
This section introduces the α-algorithm [103]. This algorithm is an example of a
γ function as mentioned in Definition 5.2, i.e., given a simple event log it pro-
duces a Petri net that (hopefully) can replay the log. The α-algorithm was one of the
first process discovery algorithms that could adequately deal with concurrency (see
Sect. 6.5). However, the α-algorithm should not be seen as a very practical mining
technique as it has problems with noise, infrequent/incomplete behavior, and com-
plex routing constructs. Nevertheless, it provides a good introduction into the topic.
The α-algorithm is simple and many of its ideas have been embedded in more com-
plex and robust techniques. We will use the algorithm as a baseline for discussing
the challenges related to process discovery and for introducing more practical algo-
rithms.
5.2.1 Basic Idea
∗
Input for the α-algorithm is a simple event log L over A , i.e., L ∈ B(A ).Inthe
remainder, we will simply refer to L as the event log. We refer to the elements of
A as activities, see Sect. 2.2. These activities will correspond to transitions in the
discovered Petri net. In this chapter, we will use the convention that capital letters
refer to sets of activities (e.g., A,B ⊆ A ), whereas for individual activities no cap-
italization is used (e.g., a,b,c,... ∈ A ). The output of the α-algorithm is a marked
Petri net, i.e., α(L) = (N,M). We aim at the discovery of WF-nets. Therefore, we
can omit the initial marking and write α(L) = N (the initial marking is implied;
M =[i]).
The α-algorithm scans the event log for particular patterns. For example, if ac-
tivity a is followed by b but b is never followed by a, then it is assumed that there is
a causal dependency between a and b. To reflect this dependency, the correspond-
ing Petri net should have a place connecting a to b. We distinguish four log-based
ordering relations that aim to capture relevant patterns in the log.