Page 122 -
P. 122

104                                                 4 Getting the Data


              For a given set A, A is the set of all finite sequences over A. A finite sequence
                              ∗
              over A of length n is a mapping σ ∈{1,...,n}→ A. Such a sequence is rep-
              resented by a string, i.e., σ = a 1 ,a 2 ,...,a n   where a i = σ(i) for 1 ≤ i ≤ n.


              |σ| denotes the length of the sequence, i.e. |σ|= n. σ ⊕ a = a 1 ,...,a n ,a
              is the sequence with element a appended at the end. Similarly, σ 1 ⊕ σ 2 ap-

              pends sequence σ 2 to σ 1 resulting a sequence of length |σ 1 |+|σ 2 |.
                k
              hd (σ) = a 1 ,a 2 ,...,a k min n  , i.e., the “head” of the sequence consisting of
                                                  0
              the first k elements (if possible). Note that hd (σ) is the empty sequence and
                                             k
                         k
              for k ≥ n: hd (σ) = σ. pref(σ) ={hd (σ) | 0 ≤ k ≤ n} is the set of prefixes
              of σ.
               k
              tl (σ) = a (n−k+1) max 1 ,a k+2 ,...,a n  , i.e., the “tail” of the sequence com-
                                                          0
              posed of the last k elements (if possible). Note that tl (σ) is the empty se-
                                 k
              quence and for k ≥ n: tl (σ) = σ.
              σ ↑ X is the projection of σ onto some subset X ⊆ A, e.g.,  a,b,c,a,b,
              c,d ↑{a,b}= a,b,a,b  and  d,a,a,a,a,a,a,d ↑ {d}= d,d .
              For any sequence σ = a 1 ,a 2 ,...,a n   over A, ∂ set (σ) ={a 1 ,a 2 ,...,a n }
              and ∂ multiset (σ) =[a 1 ,a 2 ,...,a n ]. ∂ set converts a sequence into a set, e.g.,
              ∂ set ( d,a,a,a,a,a,a,d ) ={a,d}. a is an element of σ, denoted as a ∈ σ,if
              and only if a ∈ ∂ set (σ). ∂ multiset converts a sequence into a multi-set, e.g.,
                                              2
                                           6
              ∂ multiset ( d,a,a,a,a,a,a,d ) =[a ,d ]. ∂ multiset (σ) is also known as the
              Parikh vector of σ. These conversions allow us to treat sequences as sets
              or bags when needed.
              An event log consists of cases and cases consist of events. The events for a case
            are represented in the form of a trace, i.e., a sequence of unique events. Moreover,
            cases, like events, can have attributes.

            Definition 4.3 (Case, trace, event log) Let C be the case universe, i.e., the set of
            all possible case identifiers. Cases, like events, have attributes. For any case c ∈ C
            and name n ∈ AN:# n (c) is the value of attribute n for case c (# n (c) =⊥ if case
            c has no attribute named n). Each case has a special mandatory attribute trace:
                      ∗ 1
            # trace (c) ∈ E . ˆc = # trace (c) is a shorthand for referring to the trace of a case.
                                                  ∗
              A trace is a finite sequence of events σ ∈ E such that each event appears only
            once, i.e., for 1 ≤ i< j ≤|σ|: σ(i)  = σ(j).
              An event log is a set of cases L ⊆ C such that each event appears at most once
            in the entire log, i.e., for any c 1 ,c 2 ∈ L such that c 1  = c 2 : ∂ set ( ˆc 1 ) ∩ ∂ set ( ˆc 2 ) =∅.

              If an event log contains timestamps, then the ordering in a trace should respect
            these timestamps, i.e., for any c ∈ L, i and j such that 1 ≤ i< j ≤|ˆc|:# time(ˆc(i)) ≤
            # time (ˆc(j)).
              Events and cases are represented using unique identifiers. An identifier e ∈ E
            refers to an event and an identifier c ∈ C refers to a case. This mechanism allows us

            1 In the remainder, we assume # trace (c)  =    , i.e., traces in a log contain at least one event.
   117   118   119   120   121   122   123   124   125   126   127