Page 189 -
P. 189

6.3 Genetic Process Mining                                      171

            Therefore, a more refined fitness function needs to be used that also rewards the
            partial correctness of the model and takes into account all four competing quality
            criteria described in Sect. 5.4.3. The best individuals, i.e., the process models having
            the highest fitness value are moved to the next generation. This is called elitism.For
            instance, the best 1% of the current generation is passed on to the next generation
            without any modifications. Through tournaments, “parents” are selected for creating
            new individuals. Tournaments among individuals and elitism should make sure that
            the “genetic material of the best process models” has the highest probability of being
            used for the next generation: survival of the fittest. As a result, individuals with
            a poor fitness are unlikely to survive. Figure 6.8 refers to such models as “dead”
            individuals.
              In the reproduction phase, the selected parent individuals are used to create
            new offspring. Here two genetic operators are used: crossover and mutation.For
            crossover two individuals are taken and used to create two new models; these end up
            in the pool with “child models” shown in Fig. 6.8. These child models share parts of
            the genetic material of their parents. The resulting children are then modified using
            mutation, e.g., randomly adding or deleting a causal dependency. Mutation is used
            to insert new generic material in the next generation. Without mutation, evolution
            beyond the genetic material in the initial population is impossible.
              Through reproduction (i.e., crossover and mutation) and elitism, a new genera-
            tion is created. For the models in this generation the fitness is computed. Again the
            best individuals move on to the next round (elitism) or are used to produce new off-
            spring. This is repeated and the expectation is that the “quality” of each generation
            gets better and better. The evolution process terminates when a satisfactory solution
            is found, i.e., a model having at least the desired fitness. Depending on the event log
            it may take a very long time to converge. In fact, due to the representational bias
            and noise in the event log there may not be a model that has the desired level of fit-
            ness. Therefore, other termination criteria may be added (e.g., a maximum number
            of generations or stopping when 10 successive generations do not produce better
            individuals). When terminating, a model with the best fitness is returned.
              The approach described in Fig. 6.8 is very general. When actually implementing
            a genetic process mining algorithm the following design choices need to made:
            • Representation of individuals. Each individual corresponds to a process model
              described in a particular language, e.g., Petri nets, C-nets, BPMN, or EPCs. This
              choice is important as it determines the class of processes that can be discovered
              (representational bias). Moreover, it should be possible to define suitable genetic
              operators for the representation chosen. In [31], a variant of C-nets is used.
            • Initialization. For the initial population, models need to be generated randomly.
              In [31], two approaches are proposed: (a) an approach where with a certain prob-
              ability a causal dependency between two activities is inserted to create C-nets
              and (b) an approach in which a randomized variant of heuristic mining is used
              to create an initial population with a higher average fitness than purely randomly
              generated C-nets.
            • Fitness function. Here, the challenge is to define a function that balances the four
              competing quality criteria described in Sect. 5.4.3. Many fitness functions can be
   184   185   186   187   188   189   190   191   192   193   194