Page 99 -
P. 99

3.5 Sequence and Episode Mining                                 81

            the time window  a,b,d,c,d
 contains the episode despite the two occurrences
            of d. Therefore, episodes cannot be read as if they are process models. Moreover,
            episodes cannot model choices, loops, etc. Finally, episode mining and sequence
            mining cannot handle concurrency well. Sequence mining searches for sequential
            patterns only. Episode mining runs into problems if there are concurrent episodes,
            because it is unclear what time window to select to get meaningful episodes.



            3.5.3 Other Approaches


            In the data mining and machine learning communities, several other techniques have
            been developed to analyze sequences of events. Applications are in text mining (se-
            quences of letters and words), bio-informatics (analysis of DNA sequences), speech
            recognition, web analytics, etc. Examples of techniques that are used for this pur-
            pose are neural networks and hidden Markov models [5, 67].
              Artificial neural networks try to mimic the human brain in order to learn com-
            plex tasks. An artificial neural network is an interconnected group of nodes, akin to
            the vast network of neurons in the human brain. Different learning paradigms can
            be used to train the neural network: supervised learning, unsupervised learning, and
            reinforcement learning [5, 67]. Advantages are that neural networks can exploit par-
            allel computing and that they can be used to solve ill-defined tasks, e.g., image and
            speech recognition. The main drawback is that the resulting model (e.g., a multi-
            layer perceptron), is typically not human readable. Hence there is no resulting pro-
            cess model in the sense of Chap. 2 (e.g., a WF-net or BPMN model).
              Hidden Markov models are an extension of ordinary Markov processes. A hid-
            den Markov model has a set of states and transition probabilities. Moreover, unlike
            standard Markov models, in each state an observation is possible, but the state it-
            self remains hidden. Observations have probabilities per state as shown in Fig. 3.12.
            Three fundamental problems have been investigated for hidden Markov models [5]:
            • Given an observation sequence, how to compute the probability of the sequence
              given a hidden Markov model?
            • Given an observation sequence and a hidden Markov model, how to compute the
              most likely “hidden path” in the model?
            • Given a set of observation sequences, how to derive the hidden Markov model
              that maximizes the probability of producing these sequences?
            The last problem is most related to process mining but also the most difficult
            problem. The well-known Baum–Welch algorithm [5] is a so-called Expectation-
            Maximization (EM) algorithm that solves this problem iteratively for a fixed num-
            ber of states. Although hidden Markov models are versatile and relevant for process
            mining, there are several complications. First of all, there are many computational
            challenges due to the time consuming iterative procedures. Second, one needs to
            guess an appropriate number of states as this is input to the algorithm. Third, the
            resulting hidden Markov model is typically not very accessible for the end user, i.e.,
   94   95   96   97   98   99   100   101   102   103   104