Page 146 -
P. 146
128 5 Process Discovery: An Introduction
Fig. 5.3 Two BPMN models: (a) the model corresponding to WF-net N 1 discovered for L 1 ,and
(b) the model corresponding to WF-net N 2 discovered for L 2
two trace equivalent BPMN models shown in Fig. 5.3. Similarly, the discovered
models could have been translated into equivalent EPCs, UML activity diagrams,
statecharts, YAWL models, BPEL specifications, etc.
In the general problem formulation (Definition 5.1), we stated that the discovered
model should be “representative” for the behavior seen in the event log. In Defini-
tion 5.2, this was operationalized by requiring that the model is able to replay all
behavior in this log, i.e., any trace in the event log is a possible firing sequence of
the WF-net. This is the so-called “fitness” requirement. In general, there is a trade-
off between the following four quality criteria:
• Fitness: the discovered model should allow for the behavior seen in the event log.
• Precision: the discovered model should not allow for behavior completely unre-
lated to what was seen in the event log.
• Generalization: the discovered model should generalize the example behavior
seen in the event log.
• Simplicity: the discovered model should be as simple as possible.
A model having a good fitness is able to replay most of the traces in the log. Preci-
sion is related to the notion of underfitting presented in the context of data mining
(see Sect. 3.6.3). A model having a poor precision is underfitting, i.e., it allows for
behavior that is very different from what was seen in the event log. Generaliza-