Page 247 -
P. 247
8.3 Organizational Mining 229
In Table 8.3, we abstracted from transaction types, i.e., we did not consider the
start and completion of an activity instance. Most logs will contain such information.
For example, Table 8.1 shows the start and completion of each activity instance.
Some logs will even show when a workitem is offered to a resource or when it is
assigned. If such events are recorded, then a diagram such as Fig. 8.10 can also show
detailed time related information. For example, the utilization and response times of
resources can be shown.
Assuming that the event log contains high quality information including precise
timestamps and transaction types, the behavior of resources can be analyzed in de-
tail [95]. Of course privacy issues play an important role here. However, the event
log can be anonymized prior to analysis. Moreover, in most organizations one would
like to do such analysis at an aggregate level rather than at the level of individuals.
For instance, in Sect. 2.1, we mentioned the Yerkes–Dodson law of arousal which
describes the relation between workload and performance of people. This law hy-
pothesizes that people work faster when the workload increases. If the event log
contains precise timestamps and transaction types, then it is easy to empirically in-
vestigate this phenomenon. For any activity instance, one knows its duration and
by scanning the log it is also easy to see what the workload was when the activity
instance was being performed by some resource. Using supervised learning (e.g.,
regression analysis or decision tree analysis), the effects of different workloads on
service and response times can be measured. See [95] for more examples.
Privacy and Anonymization
Event logs may contain sensitive or private data. Events refer to actions and
properties of customers, employees, etc. For instance, when applying process
mining in a hospital it is important to ensure data privacy. It would be unac-
ceptable that data about patients would be used by unauthorized persons or
that event data about treatments would be used in a way not intended when
releasing the data. The challenge in process mining is to use event logs to
improve processes and information systems while protecting personally iden-
tifiable information and not revealing sensitive data. Therefore, most event
logs contain anonymized attribute values. For example, the name of the cus-
tomer or employee is often irrelevant for questions that need to be answered.
To make an attribute anonymous, the original value is mapped onto a new
value in a deterministic manner. This ensures that one can correlate attributes
in one event to attributes in another event without knowing the actual values.
For instance, all occurrences of the name “Wil van der Aalst” are mapped onto
“Q2T4R5R7X1Y9Z”. The mapping of the original value onto the anonymized
value should be such that it is not easy (or even impossible) to compute the
inverse of the mapping. Anonymous data can sometimes be de-anonymized
by combining different data sources. For example, it is often possible to trace
back an individual based on her birth date and the birth dates of her children.
Therefore, even “anonymous data” should be handled carefully.