Page 143 -
P. 143

Chapter 5
            Process Discovery: An Introduction

















            Process discovery is one of the most challenging process mining tasks. Based on an
            event log, a process model is constructed thus capturing the behavior seen in the log.
            This chapter introduces the topic using the rather naïve α-algorithm. This algorithm
            nicely illustrates some of the general ideas used by many process mining algorithms
            and helps to understand the notion of process discovery. Moreover, the α-algorithm
            serves as a stepping stone for discussing challenges related to process discovery.



            5.1 Problem Statement

            As discussed in Chap. 1, there are three types of process mining: discovery, con-
            formance, and enhancement. Moreover, we identified various perspectives, e.g., the
            control-flow perspective, the organizational or resource perspective, the data per-
            spective, and the time perspective. In this chapter, we focus on the discovery task
            and the control-flow perspective. This combination is often referred to as process
            discovery. The general process discovery problem can be formulated as follows.

            Definition 5.1 (General process discovery problem) Let L be an event log as de-
            fined in Definition 4.3 or as specified by the XES standard (cf. Sect. 4.3). A process
            discovery algorithm is a function that maps L onto a process model such that the
            model is “representative” for the behavior seen in the event log. The challenge is to
            find such an algorithm.

              This definition does not specify what kind of process model should be gener-
            ated, e.g., a BPMN, EPC, YAWL, or Petri net model. Moreover, event logs with
            potentially many attributes may be used as input. Recall that the XES format allows
            for storing information related to all perspectives whereas here the focus is on the
            control-flow perspective. The only requirement is that the behavior is “representa-
            tive”, but it is unclear what this means.
              Definition 5.1 is rather broad and vague. The target format is not specified and
            a potentially “rich” event log is used as input without specifying tangible require-
            W.M.P. van der Aalst, Process Mining,                           125
            DOI 10.1007/978-3-642-19345-3_5, © Springer-Verlag Berlin Heidelberg 2011
   138   139   140   141   142   143   144   145   146   147   148