Page 190 -
P. 190
172 6 Advanced Process Discovery Techniques
defined. The fitness function drives the evolution process and can be used to favor
particular models. In [31], the proportion of events in the log that can be parsed by
the model is computed. This is combined with penalties for having many enabled
activities (cf. the flower model in Fig. 5.23).
• Selection strategy (tournament and elitism). The genetic algorithm needs to de-
termine the fraction of individuals that go to the next round without any changes.
Through elitism, it is ensured that good models do not get lost due to crossover or
mutation. There are different approaches to select parents for crossover. In [31],
parents are selected by randomly taking five individuals and then selecting the
best one, i.e., a tournament among five randomly selected models is used.
• Crossover. The goal of crossover is to recombine existing genetic material. The
basic idea is to create a new process model that uses parts of its two parent models.
In [29, 31], both parents are C-nets having the same set of activities. One of
these common activities is selected randomly, say a.Let I 1 (a) and O 1 (a) be the
possible bindings of one parent, and let I 2 (a) and O 2 (a) be the potential bindings
of the other parent. Now parts of I 1 (a) are swapped with parts of I 2 (a) and parts
of O 1 (a) are swapped with parts of O 2(a). Subsequently, both C-nets are repaired
as bindings need to be consistent among activities. The crossover of two parent
models results in two new child models. These child models may be mutated
before being added to the next generation.
• Mutation. The goal of mutation is to randomly insert new genetic material. In [29,
31], each activity in each child resulting from crossover has a small probability of
being selected for mutation. If this is the case, say a is selected for mutation, then
I(a) or O(a) is randomly modified by adding or removing potential bindings.
The above list shows that many design decisions need to be taken when developing
a genetic process mining algorithm. We refer to [29, 31] for concrete examples.
An essential choice is the representation of individuals. The approach described in
[29, 31] uses a variant of C-nets similar to the notation used for the initial heuristic
mining algorithm [124]. However, many other representations are possible.
To illustrate the genetic operators, we show a crossover example and a mutation
example. For clarity we use Petri nets to describe the individuals before and after
modification. Figure 6.9 shows two “parent” models and two “child” models result-
ing from crossover. In this example, the crossover point is the line through activities
e and f . Figure 6.10 shows an example of mutation: one place is removed and one
arc is added.
Figures 6.9 and 6.10 nicely illustrate the idea behind the two genetic operators:
crossover and mutation. However, the realization of such operators is not as simple
as these examples suggest. Typically repair actions are needed after crossover and
mutation. For instance, the resulting model may no longer be a WF-net or C-net.
Again we refer to [29, 31] for concrete examples.
Genetic process mining is flexible and robust. Like heuristic mining techniques,
it can deal with noise and incompleteness. The approach can also be adapted and
extended easily. By changing the fitness function, it is possible to give preference to
particular constructs. Unfortunately, like most evolutionary approaches, genetic pro-
cess mining is not very efficient for larger models and logs. It may take a very long