Page 324 -
P. 324
306 12 Analyzing “Spaghetti Processes”
In Sect. 5.4, we discussed the challenges related to process mining. They are of
particular relevance when dealing with Spaghetti processes. Event logs do not con-
tain negative examples, i.e., only positive example behavior is given. The fact that
something does not happen in an event log does not mean that it cannot happen.
For example, Fig. 12.4 is based on an event log in which almost all cases follow a
unique path (the 208 cases generate 203 different traces). Therefore, the discovery
algorithm needs to generalize. For more complex processes, i.e., processes that are
large and that allow for many behaviors, the event log is typically far from complete
(cf. Sect. 5.4.2). To further complicate matters, there may be noisy behavior, i.e.,
infrequent behavior that the user is not interested in. Because of these complica-
tions, a discovery algorithm needs to carefully balance the four quality dimensions
introduced earlier: fitness, simplicity, precision, and generalization (see Fig. 5.22).
The process models shown in Figs. 12.1 and 12.4 illustrate the relevance of these
considerations. For the characteristics of the different process discovery algorithms,
we refer to Part II of this book. Here, we only stress the importance of carefully
filtering the event log before discovery.
Let us first consider the filtering of activities based on their characteristics, e.g.,
absolute or relative frequency. Figure 12.6(a) shows a filtering plug-in selecting all
activities that occurred in at least 5% of all cases. This ProM 5.2 plug-in is ap-
plied to the event log used to construct Fig. 12.1, i.e., activities that do not appear
frequently are removed from the event log. As a result, the process model will be
simpler as fewer activities are included. Figure 12.6(b) shows a filtering plug-in in
ProM 6 applied to the event log used to construct Fig. 12.4. In this case, the top 80%
of activities are included; all other activities are removed from the log. The effect of
filtering is shown in Fig. 12.6(c). This C-net was obtained by selecting all activities
that occur in at least 50% of all cases handled by the housing agency. A compari-
son of the process model obtained using the original event log (Fig. 12.4) with the
process model obtained using the filtered event log (Fig. 12.6(c)), demonstrates the
effect of filtering. The discovered model shows only 28 of the 74 activities appearing
in the event log of the housing agency.
In principle, any model can be made as simple as desired by simply abstracting
from infrequent activities. In the extreme case, the model contains only the most fre-
quent activity. Such a model is not very useful. However, it shows that filtering can
be used to seamlessly simplify models. Interestingly, it is sometimes useful to also
abstract from very frequent activities that are interleaved with other activities (e.g.,
some system action executed after every update). These clutter the diagram while
being less relevant. Note that there may be multiple criteria for selecting/removing
activities (e.g., average costs, duration, and risks).
Besides the simple activity-based filtering illustrated by Fig. 12.6, there are more
advanced types of filtering that transform low-level patterns into activities [13].
Moreover, the cases in the log can be partitioned in homogeneous groups as shown
in [12, 32, 46]. The basic idea is that one does not try to make one large and com-
plex model for all cases, but simpler models for selected groups of cases. Here, one
can use the classical clustering techniques described in Sect. 3.3 and adapt them for
process mining. To apply these techniques, feature extraction is needed to describe