Page 114 -
P. 114
96 4 Getting the Data
Fig. 4.1 Overview describing the workflow of getting from heterogeneous data sources to process
mining results
data. Consider, for example, a full SAP implementation that typically has more than
10,000 tables. Data may be scattered due to technical or organizational reasons. For
example, there may be legacy systems holding crucial data or information systems
used only at the departmental level. For cross-organizational process mining, e.g.,
to analyze supply chains, data may even be scattered over multiple organizations.
Events can also be captured by tapping of message exchanges [107] (e.g., SOAP
messages) and recording read and write actions [36]. Data sources may be struc-
tured and well-described by meta data. Unfortunately, in many situations, the data is
unstructured or important meta data is missing. Data may originate from web pages,