Page 189 -

P. 189

6.3 Genetic Process Mining 171

Therefore, a more reﬁned ﬁtness function needs to be used that also rewards the
partial correctness of the model and takes into account all four competing quality
criteria described in Sect. 5.4.3. The best individuals, i.e., the process models having
the highest ﬁtness value are moved to the next generation. This is called elitism.For
instance, the best 1% of the current generation is passed on to the next generation
without any modiﬁcations. Through tournaments, “parents” are selected for creating
new individuals. Tournaments among individuals and elitism should make sure that
the “genetic material of the best process models” has the highest probability of being
used for the next generation: survival of the ﬁttest. As a result, individuals with
a poor ﬁtness are unlikely to survive. Figure 6.8 refers to such models as “dead”
individuals.
In the reproduction phase, the selected parent individuals are used to create
new offspring. Here two genetic operators are used: crossover and mutation.For
crossover two individuals are taken and used to create two new models; these end up
in the pool with “child models” shown in Fig. 6.8. These child models share parts of
the genetic material of their parents. The resulting children are then modiﬁed using
mutation, e.g., randomly adding or deleting a causal dependency. Mutation is used
to insert new generic material in the next generation. Without mutation, evolution
beyond the genetic material in the initial population is impossible.
Through reproduction (i.e., crossover and mutation) and elitism, a new genera-
tion is created. For the models in this generation the ﬁtness is computed. Again the
best individuals move on to the next round (elitism) or are used to produce new off-
spring. This is repeated and the expectation is that the “quality” of each generation
gets better and better. The evolution process terminates when a satisfactory solution
is found, i.e., a model having at least the desired ﬁtness. Depending on the event log
it may take a very long time to converge. In fact, due to the representational bias
and noise in the event log there may not be a model that has the desired level of ﬁt-
ness. Therefore, other termination criteria may be added (e.g., a maximum number
of generations or stopping when 10 successive generations do not produce better
individuals). When terminating, a model with the best ﬁtness is returned.
The approach described in Fig. 6.8 is very general. When actually implementing
a genetic process mining algorithm the following design choices need to made:
• Representation of individuals. Each individual corresponds to a process model
described in a particular language, e.g., Petri nets, C-nets, BPMN, or EPCs. This
choice is important as it determines the class of processes that can be discovered
(representational bias). Moreover, it should be possible to deﬁne suitable genetic
operators for the representation chosen. In [31], a variant of C-nets is used.
• Initialization. For the initial population, models need to be generated randomly.
In [31], two approaches are proposed: (a) an approach where with a certain prob-
ability a causal dependency between two activities is inserted to create C-nets
and (b) an approach in which a randomized variant of heuristic mining is used
to create an initial population with a higher average ﬁtness than purely randomly
generated C-nets.
• Fitness function. Here, the challenge is to deﬁne a function that balances the four
competing quality criteria described in Sect. 5.4.3. Many ﬁtness functions can be

184 185 186 187 188 189 190 191 192 193 194