Page 100 -
P. 100

82                                                     3  Data Mining































            Fig. 3.12 A hidden Markov model with three states: s1, s2, and s3. The arcs have state transi-
            tion probabilities as shown, e.g., in state s2 the probability of moving to state s3is 0.2 and the
            probability of moving to state s1 is 0.8. Each visit to a state generates an observation. The ob-
            servation probabilities are also given. When visiting s2 the probability of observing b is 0.6 and
            the probability of observing c is 0.4. Possible observation sequences are  a,b,c,d
,  a,b,b,c
,
            and  a,b,c,b,b,a,c,e
. For the observation sequence  a,b,c,d
, it is clear what the hidden se-
            quence is:  s1,s2,s2,s3
. For the other two observation sequences, multiple hidden sequences are
            possible

            accurate models are typically large and even for small examples the interpretation
            of the states is difficult. Clearly, hidden Markov models are at a lower abstraction
            level than the notations discussed in Chap. 2.



            3.6 Quality of Resulting Models

            This chapter provided an overview of the mainstream data mining techniques most
            relevant for process mining. Although some of these techniques can be exploited for
            process mining, they cannot be used for important process mining tasks such as pro-
            cess discovery, conformance checking, and process enhancement. However, there is
            an additional reason for showing a variety of data mining techniques. Like in data
            mining it is non-trivial to analyze the quality of process mining results. Here one can
            benefit from experiences in the data mining field. Therefore, we discuss some of the
            validation and evaluation techniques developed for the algorithms presented in this
            chapter. First, we focus on the quality of classification results, e.g., obtained through
            a decision tree. Second, we describe general techniques for cross-validation. Here,
   95   96   97   98   99   100   101   102   103   104   105