Page 146 -

P. 146

128 5 Process Discovery: An Introduction

Fig. 5.3 Two BPMN models: (a) the model corresponding to WF-net N 1 discovered for L 1 ,and
(b) the model corresponding to WF-net N 2 discovered for L 2

two trace equivalent BPMN models shown in Fig. 5.3. Similarly, the discovered
models could have been translated into equivalent EPCs, UML activity diagrams,
statecharts, YAWL models, BPEL speciﬁcations, etc.
In the general problem formulation (Deﬁnition 5.1), we stated that the discovered
model should be “representative” for the behavior seen in the event log. In Deﬁni-
tion 5.2, this was operationalized by requiring that the model is able to replay all
behavior in this log, i.e., any trace in the event log is a possible ﬁring sequence of
the WF-net. This is the so-called “ﬁtness” requirement. In general, there is a trade-
off between the following four quality criteria:
• Fitness: the discovered model should allow for the behavior seen in the event log.
• Precision: the discovered model should not allow for behavior completely unre-
lated to what was seen in the event log.
• Generalization: the discovered model should generalize the example behavior
seen in the event log.
• Simplicity: the discovered model should be as simple as possible.
A model having a good ﬁtness is able to replay most of the traces in the log. Preci-
sion is related to the notion of underﬁtting presented in the context of data mining
(see Sect. 3.6.3). A model having a poor precision is underﬁtting, i.e., it allows for
behavior that is very different from what was seen in the event log. Generaliza-

141 142 143 144 145 146 147 148 149 150 151