Page 198 -
P. 198
180 6 Advanced Process Discovery Techniques
Fig. 6.18 Region
R = (X,Y,c) corresponding
to place p R : X ={a1,a2,
c1}=•p R , Y ={b1,b2,
c1}= p R •,and c is the initial
marking of p R
sentation function like l state () definitely results in an overfitting model that can only
1
replay the log without any form of generalization. Many abstractions (i.e., state rep-
resentation functions) are possible to balance between overfitting and underfitting.
In [111], these are described systematically.
6.4.3 Process Discovery Using Language-Based Regions
As illustrated by Fig. 6.16, the goal of state-based regions is to determine the places
in a Petri net. Language-based regions also aim at finding such places but do not
use a transition system as input; instead some “language” is used as input. Several
techniques and variants of the problem have been defined. In this section we only
present the basic idea and refer to literature for details [7, 8, 18, 115].
Suppose, we have an event log in which the events refer to a set of activities A.
For this log one could construct a Petri net N ∅ with the set of transitions being A
and no places. Since a transition without any input places is continuously enabled,
this Petri net is able to reproduce the log. In fact, the Petri net N ∅ can reproduce
any log over A. In Sect. 5.4.3 we referred to such a model as the “flower model”.
There we added places and transitions to model this behavior in terms of a WF-net.
However, the idea is the same. Adding places to the Petri net N ∅ can only limit the
behavior.
Consider for example place p R in Fig. 6.18. Removing place p R will not remove
any behavior. However, adding p R may remove behavior possible in the Petri net
without this place. The behavior gets restricted when a place is empty while one of
its output transitions wants to consume a token from it. For example, b1 is blocked
if p R is unmarked while all other input places of b1 are marked. Suppose now that
we have a set of traces L. If these traces are possible in the net with place p R , then
they are also possible in the net without p R . The reverse does not always hold. This
triggers the question whether p R can be added without disabling any of the traces
in L. This is what regions are all about.
∗
Definition 6.4 (Language-based region) Let L ∈ B(A ) be a simple event log. R =
(X,Y,c) is a region of L if and only if: