Page 248 -
P. 248

230                                       8  Mining Additional Perspectives


              Note that process mining techniques do not create new data. The information
              stored in event logs originates from other databases and audit trails. Therefore,
              privacy and security issues already exist before applying process mining. Nev-
              ertheless, the active use of data and process mining techniques increases the
              risk of data misuse. Organizations should therefore continuously balance the
              benefits of creating and using event data against potential privacy and security
              problems.




            8.4 Time and Probabilities
            The time perspective is concerned with the timing and frequency of events. In most
            event logs, events have a timestamp (# time (e)). The granularity of these timestamps
            may vary. In some logs only date information is given, e.g., “30-12-2010”. Other
            event logs have timestamps with millisecond precision. The presence of timestamps
            enables the discovery of bottlenecks, the analysis of service levels, the monitoring
            of resource utilization, and the prediction of remaining processing times of running
            cases. In this section we focus on replaying event logs with timestamps.Asmall
            modification of the replay approach presented in Sect. 7.2 suffices to include the
            time perspective in process models.
              Table 8.7 shows a fragment of some larger event log highlighting the role of
            timestamps. To simplify the presentation, we use fictive two-digit timestamps rather
            than verbose timestamps like “30-12-2010:11.02”. Moreover, we assume that each
            event has a start event and a complete event. Obviously, the replay approach does
            not depend on these simplifying assumptions.
              Figure 8.11 shows some raw diagnostic information after replaying the three
            cases shown in Table 8.7. Activity a has three activity instances; one for each case.
            The first instance of a runs from time 12 to time 19. Hence, the duration of this
            activity instance is 7 time units. Activity d has four activity instances. For Case 3,
            there are two instances of d; one running from time 35 to time 40 and one running
            from time 62 to time 67. The durations of all activity instances are shown. Also
            places are annotated to indicate how long tokens remained there. For example, there
            were four periods in which a token resided in place p1: one token corresponding
            to Case 1 resided in p1 for 6 time units (from time 19 until time 25), one token
            corresponding to Case 2 resided in p1 for 7 time units (from time 23 until time 30),
            and two tokens corresponding to Case 3 resided in this place (one for 32 − 30 = 2
            time units and one for 60 − 55 = 5 time units). These times can be found using the
            approach presented in Sect. 7.2. The only modifications are that now tokens bear
            timestamps and statistics are collected during replay. In this example, all three cases
            fit perfectly (i.e., no missing or remaining tokens). One needs to ignore non-fitting
            events or cases to deal with logs that do not have a conformance of 100%. Heuristics
            are needed to deal with such situations, but here we assume perfect fitness.
              Figure 8.12 shows another view on the information gathered while replaying the
            three cases. Consider for instance Case 3. For this case, an instance of activity a was
   243   244   245   246   247   248   249   250   251   252   253