Page 113 -
P. 113
Chapter 4
Getting the Data
Process mining is impossible without proper event logs. This chapter describes the
information that should be present in such event logs. Depending on the process
mining technique used, these requirements may vary. The challenge is to extract
such data from a variety of data sources, e.g., databases, flat files, message logs,
transaction logs, ERP systems, and document management systems. When merging
and extracting data, both syntax and semantics play an important role. Moreover,
depending on the questions one seeks to answer, different views on the available
data are needed.
4.1 Data Sources
In Chap. 1, we introduced the concept of process mining. The idea is to analyze
event data from a process-oriented perspective. The goal of process mining is to
answer questions about operational processes. Examples are:
• What really happened in the past?
• Why did it happen?
• What is likely to happen in the future?
• When and why do organizations and people deviate?
• How to control a process better?
• How to redesign a process to improve its performance?
In subsequent chapters, we will discuss various techniques to answer the preceding
questions. However, first we focus on the event data needed.
Figure 4.1 shows the overall “process mining workflow” emphasizing the role
of event data. Starting point is the “raw” data hidden in all kinds of data sources.
A data source may be a simple flat file, an Excel spreadsheet, a transaction log,
or a database table. However, one should not expect all the data to be in a single
well-structured data source. The reality is that event data is typically scattered over
different data sources and often quite some efforts are needed to collect the relevant
W.M.P. van der Aalst, Process Mining, 95
DOI 10.1007/978-3-642-19345-3_4, © Springer-Verlag Berlin Heidelberg 2011