Page 190 -
P. 190

172                                 6  Advanced Process Discovery Techniques

              defined. The fitness function drives the evolution process and can be used to favor
              particular models. In [31], the proportion of events in the log that can be parsed by
              the model is computed. This is combined with penalties for having many enabled
              activities (cf. the flower model in Fig. 5.23).
            • Selection strategy (tournament and elitism). The genetic algorithm needs to de-
              termine the fraction of individuals that go to the next round without any changes.
              Through elitism, it is ensured that good models do not get lost due to crossover or
              mutation. There are different approaches to select parents for crossover. In [31],
              parents are selected by randomly taking five individuals and then selecting the
              best one, i.e., a tournament among five randomly selected models is used.
            • Crossover. The goal of crossover is to recombine existing genetic material. The
              basic idea is to create a new process model that uses parts of its two parent models.
              In [29, 31], both parents are C-nets having the same set of activities. One of
              these common activities is selected randomly, say a.Let I 1 (a) and O 1 (a) be the
              possible bindings of one parent, and let I 2 (a) and O 2 (a) be the potential bindings
              of the other parent. Now parts of I 1 (a) are swapped with parts of I 2 (a) and parts
              of O 1 (a) are swapped with parts of O 2(a). Subsequently, both C-nets are repaired
              as bindings need to be consistent among activities. The crossover of two parent
              models results in two new child models. These child models may be mutated
              before being added to the next generation.
            • Mutation. The goal of mutation is to randomly insert new genetic material. In [29,
              31], each activity in each child resulting from crossover has a small probability of
              being selected for mutation. If this is the case, say a is selected for mutation, then
              I(a) or O(a) is randomly modified by adding or removing potential bindings.
            The above list shows that many design decisions need to be taken when developing
            a genetic process mining algorithm. We refer to [29, 31] for concrete examples.
            An essential choice is the representation of individuals. The approach described in
            [29, 31] uses a variant of C-nets similar to the notation used for the initial heuristic
            mining algorithm [124]. However, many other representations are possible.
              To illustrate the genetic operators, we show a crossover example and a mutation
            example. For clarity we use Petri nets to describe the individuals before and after
            modification. Figure 6.9 shows two “parent” models and two “child” models result-
            ing from crossover. In this example, the crossover point is the line through activities
            e and f . Figure 6.10 shows an example of mutation: one place is removed and one
            arc is added.
              Figures 6.9 and 6.10 nicely illustrate the idea behind the two genetic operators:
            crossover and mutation. However, the realization of such operators is not as simple
            as these examples suggest. Typically repair actions are needed after crossover and
            mutation. For instance, the resulting model may no longer be a WF-net or C-net.
            Again we refer to [29, 31] for concrete examples.
              Genetic process mining is flexible and robust. Like heuristic mining techniques,
            it can deal with noise and incompleteness. The approach can also be adapted and
            extended easily. By changing the fitness function, it is possible to give preference to
            particular constructs. Unfortunately, like most evolutionary approaches, genetic pro-
            cess mining is not very efficient for larger models and logs. It may take a very long
   185   186   187   188   189   190   191   192   193   194   195