Page 109 -
P. 109

3.6 Quality of Resulting Models                                 91

            sense of Sect. 3.6.2. In MDL, performance is judged on the training data alone and
            not measured against new, unseen instances. The basic idea is that the “best” model
            is the one that minimizes the encoding of both model and data set. Here the insight is
            used that any regularity in the data can be used to compress the data, i.e., to describe
            it using fewer symbols than the number of symbols needed to describe the data liter-
            ally. The more regularities there are, the more the data can be compressed. Equating
            “learning” with “finding regularity”, implies that the more we are able to compress
            the data, the more we have learned about the data [47]. Obviously, a data set can
            be encoded more compactly if valuable knowledge about the data set is captured in
            the model. However, encoding such knowledge also requires space. A complex and
            overfitting model helps to reduce the encoding of the data set. A simple and under-
            fitting model can be stored compactly, but does not help in reducing the encoding
            of the data set. Note that this idea is related to the notion of entropy in decision
            tree learning. When building the decision tree, the goal is to find homogeneous leaf
            nodes that can be encoded compactly. However, when discussing algorithms for
            decision tree learning in Sect. 3.2 there was no penalty for the complexity of the
            decision tree itself. The goal of MDL is to minimize the entropy of (a) the data
            set encoded using the learned model and (b) the encoding of the model itself. To
            balance between overfitting and underfitting, variable weights may be associated to
            both encodings.
              Applying Occam’s Razor is not easy. Extracting reliable and meaningful insights
            from complex data is far from trivial. In fact, it is much easier to transform complex
            data sets into “impressive looking garbage” by abusing the techniques presented in
            this chapter. However, when used wisely, data mining can add tremendous value.
            Moreover, process mining adds the “process dimension” to data and can be used to
            dissect event data from a more holistic perspective. As will be shown in the remain-
            der, process mining creates a solid bridge between process modeling and analysis
            on the one hand and data mining on the other.
   104   105   106   107   108   109   110   111   112   113   114