Page 116 -
P. 116
98 4 Getting the Data
may be interested in the discovery of patient flows, i.e., typical diagnosis and treat-
ment paths. However, one may also be interested in optimizing the workflow within
the radiology department. Both questions require different event logs, although
some events may be shared among the two required event logs. Once an event log
is created, it is typically filtered. Filtering is an iterative process. Coarse-grained
scoping was done when extracting the data into an event log. Filtering corresponds
to fine-grained scoping based on initial analysis results. For example, for process
discovery one can decide to focus on the 10 most frequent activities to keep the
model manageable.
Based on the filtered log, the different types of process mining described in
Sect. 1.3 can be applied: discovery, conformance, and enhancement.
Although Fig. 4.1 does not reflect the iterative nature of the whole process well,
it should be noted that process mining results most likely trigger new questions and
these questions may lead to the exploration of new data sources and more detailed
data extractions. Typically, several iterations of the extraction, filtering, and mining
phases are needed.
4.2 Event Logs
Table 4.1 shows a fragment of the event log already discussed in Chap. 1.This
table illustrates the typical information present in an event log used for process
mining. The table shows events related to the handling of requests for compensa-
tion. We assume that an event log contains data related to a single process, i.e., the
first coarse-grained scoping step in Fig. 4.1 should make sure that all events can be
related to this process. Moreover, each event in the log needs to refer to a single pro-
cess instance, often referred to as case. In Table 4.1, each request corresponds to a
case, e.g., Case 1. We also assume that events can be related to some activity.InTa-
ble 4.1, events refer to activities like register request, check ticket, and reject. These
assumptions are quite natural in the context of process mining. All mainstream pro-
cess modeling notations, including the ones discussed in Chap. 2, specify a process
as a collection of activities such that the life-cycle of a single instance is described.
Hence, the “case id” and “activity” columns in Table 4.1 represent the bare mini-
mum for process mining. Moreover, events within a case need to be ordered. For
example, event 35654423 (the execution of activity register request for Case 1) oc-
curs before event 35654424 (the execution of activity examine thoroughly for the
same case). Without ordering information, it is of course impossible to discover
causal dependencies in process models.
Table 4.1 also shows additional information per event. For example, all events
have a timestamp (i.e., date and time information such as “30-12-2010:11.02”). This
information is useful when analyzing performance related properties, e.g., the wait-
ing time between two activities. The events in Table 4.1 also refer to resources, i.e.,
the persons executing the activities. Also costs are associated to events. In the con-
text of process mining, these properties are referred to as attributes. These attributes
are similar to the notion of variables in Chap. 3.