Page 33 -
P. 33
1.4 Analyzing an Example Log 15
Fig. 1.6 The process model discovered by the α-algorithm based on Cases 1 and 4, i.e., the set of
traces { a,b,d,e,h , a,d,b,e,h }
The Petri net shown in Fig. 1.5 also allows for traces not present in Table 1.2.For
example, the traces a,d,c,e,f,b,d,e,g and a,c,d,e,f,c,d,e,f,c,d,e,f,c,
d,e,f,b,d,e,g are also possible. This is a desired phenomenon as the goal is
not to represent just the particular set of example traces in the event log. Process
mining algorithms need to generalize the behavior contained in the log to show the
most likely underlying model that is not invalidated by the next set of observations.
One of the challenges of process mining is to balance between “overfitting” (the
model is too specific and only allows for the “accidental behavior” observed) and
“underfitting” (the model is too general and allows for behavior unrelated to the
behavior observed).
When comparing the event log and the model, there seems to be a good balance
between “overfitting” and “underfitting”. All cases start with a and end with either
g or h.Every e is preceded by d and one of the examination activities (b or c).
Moreover, e is followed by f , g,or h. The repeated execution of b or c, d, and e
suggests the presence of a loop. These characteristics are adequately captured by
the net of Fig. 1.5.
Let us now consider an event log consisting of only two traces a,b,d,e,h and
a,d,b,e,h , i.e., Cases 1 and 4 of the original log. For this log, the α-algorithm
constructs the Petri net shown in Fig. 1.6. This model only allows for two traces
and these are exactly the ones in the small event log. b and d are modeled as being
concurrent because they can be executed in any order. For larger and more complex
models, it is important to discover concurrency. Not modeling concurrency typi-
cally results in large “Spaghetti-like” models in which the same activity needs to be
duplicated. 1
The α-algorithm is just one of many possible process discovery algorithms. For
real-life logs, more advanced algorithms are needed to better balance between “over-
fitting” and “underfitting” and to deal with “incompleteness” (i.e., logs containing
only a small fraction of the possible behavior due to the large number of alternatives)
and “noise” (i.e., logs containing exceptional/infrequent behavior that should not au-
tomatically be incorporated in the model). This book will describe several of such
algorithms and guide the reader in selecting one. In this section, we used Petri nets
1 See, for example, Figs.12.1 and 12.10 to understand why we use the term “Spaghetti” to refer to
models that are difficult to comprehend.