Page 113 -
P. 113

Chapter 4
            Getting the Data

















            Process mining is impossible without proper event logs. This chapter describes the
            information that should be present in such event logs. Depending on the process
            mining technique used, these requirements may vary. The challenge is to extract
            such data from a variety of data sources, e.g., databases, flat files, message logs,
            transaction logs, ERP systems, and document management systems. When merging
            and extracting data, both syntax and semantics play an important role. Moreover,
            depending on the questions one seeks to answer, different views on the available
            data are needed.



            4.1 Data Sources


            In Chap. 1, we introduced the concept of process mining. The idea is to analyze
            event data from a process-oriented perspective. The goal of process mining is to
            answer questions about operational processes. Examples are:
            • What really happened in the past?
            • Why did it happen?
            • What is likely to happen in the future?
            • When and why do organizations and people deviate?
            • How to control a process better?
            • How to redesign a process to improve its performance?
            In subsequent chapters, we will discuss various techniques to answer the preceding
            questions. However, first we focus on the event data needed.
              Figure 4.1 shows the overall “process mining workflow” emphasizing the role
            of event data. Starting point is the “raw” data hidden in all kinds of data sources.
            A data source may be a simple flat file, an Excel spreadsheet, a transaction log,
            or a database table. However, one should not expect all the data to be in a single
            well-structured data source. The reality is that event data is typically scattered over
            different data sources and often quite some efforts are needed to collect the relevant
            W.M.P. van der Aalst, Process Mining,                           95
            DOI 10.1007/978-3-642-19345-3_4, © Springer-Verlag Berlin Heidelberg 2011
   108   109   110   111   112   113   114   115   116   117   118