Page 33 -
P. 33

1.4 Analyzing an Example Log                                    15













            Fig. 1.6 The process model discovered by the α-algorithm based on Cases 1 and 4, i.e., the set of
            traces { a,b,d,e,h , a,d,b,e,h }

              The Petri net shown in Fig. 1.5 also allows for traces not present in Table 1.2.For
            example, the traces  a,d,c,e,f,b,d,e,g  and  a,c,d,e,f,c,d,e,f,c,d,e,f,c,
            d,e,f,b,d,e,g  are also possible. This is a desired phenomenon as the goal is
            not to represent just the particular set of example traces in the event log. Process
            mining algorithms need to generalize the behavior contained in the log to show the
            most likely underlying model that is not invalidated by the next set of observations.
            One of the challenges of process mining is to balance between “overfitting” (the
            model is too specific and only allows for the “accidental behavior” observed) and
            “underfitting” (the model is too general and allows for behavior unrelated to the
            behavior observed).
              When comparing the event log and the model, there seems to be a good balance
            between “overfitting” and “underfitting”. All cases start with a and end with either
            g or h.Every e is preceded by d and one of the examination activities (b or c).
            Moreover, e is followed by f , g,or h. The repeated execution of b or c, d, and e
            suggests the presence of a loop. These characteristics are adequately captured by
            the net of Fig. 1.5.
              Let us now consider an event log consisting of only two traces  a,b,d,e,h  and
             a,d,b,e,h , i.e., Cases 1 and 4 of the original log. For this log, the α-algorithm
            constructs the Petri net shown in Fig. 1.6. This model only allows for two traces
            and these are exactly the ones in the small event log. b and d are modeled as being
            concurrent because they can be executed in any order. For larger and more complex
            models, it is important to discover concurrency. Not modeling concurrency typi-
            cally results in large “Spaghetti-like” models in which the same activity needs to be
            duplicated. 1
              The α-algorithm is just one of many possible process discovery algorithms. For
            real-life logs, more advanced algorithms are needed to better balance between “over-
            fitting” and “underfitting” and to deal with “incompleteness” (i.e., logs containing
            only a small fraction of the possible behavior due to the large number of alternatives)
            and “noise” (i.e., logs containing exceptional/infrequent behavior that should not au-
            tomatically be incorporated in the model). This book will describe several of such
            algorithms and guide the reader in selecting one. In this section, we used Petri nets


            1 See, for example, Figs.12.1 and 12.10 to understand why we use the term “Spaghetti” to refer to
            models that are difficult to comprehend.
   28   29   30   31   32   33   34   35   36   37   38