Page 273 -

P. 273

9.4 Predict 255

multi-set have impact on the reliability of the prediction. Rather than giving a
single prediction value, it is also possible to produce predictions like “With 90%
conﬁdence the remaining ﬂow time is predicted to be between 40 and 45 days”
or “78% of similar cases were handled within 50 days”. Moreover, as shown
in [113] it is possible to use cross-validation to determine the quality of predic-
tions.
The approach based on an annotated transition system is not restricted to pre-
dicting the remaining ﬂow time. Obviously, one could predict the sojourn time in
a similar fashion. Moreover, also non-time-related predictions can be made us-
ing the same approach. For example, suppose that we are interested in whether
the request is accepted (activity g occurs) or rejected (activity h occurs). To
make such predictions, we annotate states with information about known out-
accepted
comes for “post mortem” cases. For example, Q =[0,1,1,1,...]. For state
[p3,p4]
[p3,p4], a “0” is added to this multi-set for each visit of a case that will be
rejected and “1” is added for each visit of a case that will be accepted. The
accepted
average value of Q is a predictor for the probability that a case visit-
[p3,p4]
ing state [p3,p4] will be accepted. This example shows that a wide variety
of predictions can be generated using a suitable annotated transition system. It
is important to note that process-related information is taken into account, i.e.,
the prediction is based on the state of the running case rather than some static
attribute. Classical data mining approaches (e.g., based on regression or deci-
sion trees) typically use static attributes of a case rather than state informa-
tion.
The transition system shown in Fig. 9.10 happens to coincide with the states of
the WF-net and BPMN model provided earlier. However, as discussed in Sect. 6.4.1,
different transition systems can be constructed based on an event log. The event
log L and the state representation function l state () determine the level of detail
and the aspects considered. For example, it is possible to abstract from irrelevant
activities resulting in a more coarse-grained transition system. However, it is also
possible to include information about resources and data in the state, thus resulting
in a more ﬁne-grained transition system. There should be sufﬁcient visits to all states
to make reliable predictions. The transition system is too ﬁne-grained if many states
are rarely visited when replaying log L. The level of abstraction should be consistent
with the size of the log and the response variable that needs to be predicted. For
supervised learning, this is generally referred to as the problem of feature extraction,
i.e., determining the predictor variables that are most relevant for predicting the
response variable. See [113] for more details and examples.
The approach based on annotated transition systems is just one of many ap-
proaches that could be used for prediction. For example, short-term simulation could
be used to explore the possible futures of a particular case in a particular state (see
Sect. 8.6). The simulation model learned based on historic data is initialized with the
current state of the running case. Subsequently, the remaining lifetime of the case is
simulated repeatedly to obtain sample measurements for the performance indicator
to be predicted.

268 269 270 271 272 273 274 275 276 277 278