Page 248 -
P. 248
230 8 Mining Additional Perspectives
Note that process mining techniques do not create new data. The information
stored in event logs originates from other databases and audit trails. Therefore,
privacy and security issues already exist before applying process mining. Nev-
ertheless, the active use of data and process mining techniques increases the
risk of data misuse. Organizations should therefore continuously balance the
benefits of creating and using event data against potential privacy and security
problems.
8.4 Time and Probabilities
The time perspective is concerned with the timing and frequency of events. In most
event logs, events have a timestamp (# time (e)). The granularity of these timestamps
may vary. In some logs only date information is given, e.g., “30-12-2010”. Other
event logs have timestamps with millisecond precision. The presence of timestamps
enables the discovery of bottlenecks, the analysis of service levels, the monitoring
of resource utilization, and the prediction of remaining processing times of running
cases. In this section we focus on replaying event logs with timestamps.Asmall
modification of the replay approach presented in Sect. 7.2 suffices to include the
time perspective in process models.
Table 8.7 shows a fragment of some larger event log highlighting the role of
timestamps. To simplify the presentation, we use fictive two-digit timestamps rather
than verbose timestamps like “30-12-2010:11.02”. Moreover, we assume that each
event has a start event and a complete event. Obviously, the replay approach does
not depend on these simplifying assumptions.
Figure 8.11 shows some raw diagnostic information after replaying the three
cases shown in Table 8.7. Activity a has three activity instances; one for each case.
The first instance of a runs from time 12 to time 19. Hence, the duration of this
activity instance is 7 time units. Activity d has four activity instances. For Case 3,
there are two instances of d; one running from time 35 to time 40 and one running
from time 62 to time 67. The durations of all activity instances are shown. Also
places are annotated to indicate how long tokens remained there. For example, there
were four periods in which a token resided in place p1: one token corresponding
to Case 1 resided in p1 for 6 time units (from time 19 until time 25), one token
corresponding to Case 2 resided in p1 for 7 time units (from time 23 until time 30),
and two tokens corresponding to Case 3 resided in this place (one for 32 − 30 = 2
time units and one for 60 − 55 = 5 time units). These times can be found using the
approach presented in Sect. 7.2. The only modifications are that now tokens bear
timestamps and statistics are collected during replay. In this example, all three cases
fit perfectly (i.e., no missing or remaining tokens). One needs to ignore non-fitting
events or cases to deal with logs that do not have a conformance of 100%. Heuristics
are needed to deal with such situations, but here we assume perfect fitness.
Figure 8.12 shows another view on the information gathered while replaying the
three cases. Consider for instance Case 3. For this case, an instance of activity a was