Page 99 -
P. 99
3.5 Sequence and Episode Mining 81
the time window a,b,d,c,d
contains the episode despite the two occurrences
of d. Therefore, episodes cannot be read as if they are process models. Moreover,
episodes cannot model choices, loops, etc. Finally, episode mining and sequence
mining cannot handle concurrency well. Sequence mining searches for sequential
patterns only. Episode mining runs into problems if there are concurrent episodes,
because it is unclear what time window to select to get meaningful episodes.
3.5.3 Other Approaches
In the data mining and machine learning communities, several other techniques have
been developed to analyze sequences of events. Applications are in text mining (se-
quences of letters and words), bio-informatics (analysis of DNA sequences), speech
recognition, web analytics, etc. Examples of techniques that are used for this pur-
pose are neural networks and hidden Markov models [5, 67].
Artificial neural networks try to mimic the human brain in order to learn com-
plex tasks. An artificial neural network is an interconnected group of nodes, akin to
the vast network of neurons in the human brain. Different learning paradigms can
be used to train the neural network: supervised learning, unsupervised learning, and
reinforcement learning [5, 67]. Advantages are that neural networks can exploit par-
allel computing and that they can be used to solve ill-defined tasks, e.g., image and
speech recognition. The main drawback is that the resulting model (e.g., a multi-
layer perceptron), is typically not human readable. Hence there is no resulting pro-
cess model in the sense of Chap. 2 (e.g., a WF-net or BPMN model).
Hidden Markov models are an extension of ordinary Markov processes. A hid-
den Markov model has a set of states and transition probabilities. Moreover, unlike
standard Markov models, in each state an observation is possible, but the state it-
self remains hidden. Observations have probabilities per state as shown in Fig. 3.12.
Three fundamental problems have been investigated for hidden Markov models [5]:
• Given an observation sequence, how to compute the probability of the sequence
given a hidden Markov model?
• Given an observation sequence and a hidden Markov model, how to compute the
most likely “hidden path” in the model?
• Given a set of observation sequences, how to derive the hidden Markov model
that maximizes the probability of producing these sequences?
The last problem is most related to process mining but also the most difficult
problem. The well-known Baum–Welch algorithm [5] is a so-called Expectation-
Maximization (EM) algorithm that solves this problem iteratively for a fixed num-
ber of states. Although hidden Markov models are versatile and relevant for process
mining, there are several complications. First of all, there are many computational
challenges due to the time consuming iterative procedures. Second, one needs to
guess an appropriate number of states as this is input to the algorithm. Third, the
resulting hidden Markov model is typically not very accessible for the end user, i.e.,