Page 147 -
P. 147

5.2 A Simple Algorithm for Process Discovery                    129

            tion is related to the notion of overfitting. An overfitting model does not generalize
            enough, i.e., it is too specific and too much driven by examples in the event log. The
            fourth quality criterion is related to Occam’s Razor which states that “one should not
            increase, beyond what is necessary, the number of entities required to explain any-
            thing” (see Sect. 3.6.3). Following this principle, we look for the “simplest process
            model” that can explain what is observed in the event log.
              It turns out to be challenging to balance the four quality criteria. For instance, an
            oversimplified model is likely to have a low fitness or lack of precision. Moreover,
            there is an obvious trade-off between underfitting and overfitting. We discuss these
            four quality criteria later in this chapter. However, we first introduce a concrete
            process discovery algorithm.



            5.2 A Simple Algorithm for Process Discovery

            This section introduces the α-algorithm [103]. This algorithm is an example of a
            γ function as mentioned in Definition 5.2, i.e., given a simple event log it pro-
            duces a Petri net that (hopefully) can replay the log. The α-algorithm was one of the
            first process discovery algorithms that could adequately deal with concurrency (see
            Sect. 6.5). However, the α-algorithm should not be seen as a very practical mining
            technique as it has problems with noise, infrequent/incomplete behavior, and com-
            plex routing constructs. Nevertheless, it provides a good introduction into the topic.
            The α-algorithm is simple and many of its ideas have been embedded in more com-
            plex and robust techniques. We will use the algorithm as a baseline for discussing
            the challenges related to process discovery and for introducing more practical algo-
            rithms.




            5.2.1 Basic Idea

                                                                       ∗
            Input for the α-algorithm is a simple event log L over A , i.e., L ∈ B(A ).Inthe
            remainder, we will simply refer to L as the event log. We refer to the elements of
            A as activities, see Sect. 2.2. These activities will correspond to transitions in the
            discovered Petri net. In this chapter, we will use the convention that capital letters
            refer to sets of activities (e.g., A,B ⊆ A ), whereas for individual activities no cap-
            italization is used (e.g., a,b,c,... ∈ A ). The output of the α-algorithm is a marked
            Petri net, i.e., α(L) = (N,M). We aim at the discovery of WF-nets. Therefore, we
            can omit the initial marking and write α(L) = N (the initial marking is implied;
            M =[i]).
              The α-algorithm scans the event log for particular patterns. For example, if ac-
            tivity a is followed by b but b is never followed by a, then it is assumed that there is
            a causal dependency between a and b. To reflect this dependency, the correspond-
            ing Petri net should have a place connecting a to b. We distinguish four log-based
            ordering relations that aim to capture relevant patterns in the log.
   142   143   144   145   146   147   148   149   150   151   152