Page 109 -
P. 109
3.6 Quality of Resulting Models 91
sense of Sect. 3.6.2. In MDL, performance is judged on the training data alone and
not measured against new, unseen instances. The basic idea is that the “best” model
is the one that minimizes the encoding of both model and data set. Here the insight is
used that any regularity in the data can be used to compress the data, i.e., to describe
it using fewer symbols than the number of symbols needed to describe the data liter-
ally. The more regularities there are, the more the data can be compressed. Equating
“learning” with “finding regularity”, implies that the more we are able to compress
the data, the more we have learned about the data [47]. Obviously, a data set can
be encoded more compactly if valuable knowledge about the data set is captured in
the model. However, encoding such knowledge also requires space. A complex and
overfitting model helps to reduce the encoding of the data set. A simple and under-
fitting model can be stored compactly, but does not help in reducing the encoding
of the data set. Note that this idea is related to the notion of entropy in decision
tree learning. When building the decision tree, the goal is to find homogeneous leaf
nodes that can be encoded compactly. However, when discussing algorithms for
decision tree learning in Sect. 3.2 there was no penalty for the complexity of the
decision tree itself. The goal of MDL is to minimize the entropy of (a) the data
set encoded using the learned model and (b) the encoding of the model itself. To
balance between overfitting and underfitting, variable weights may be associated to
both encodings.
Applying Occam’s Razor is not easy. Extracting reliable and meaningful insights
from complex data is far from trivial. In fact, it is much easier to transform complex
data sets into “impressive looking garbage” by abusing the techniques presented in
this chapter. However, when used wisely, data mining can add tremendous value.
Moreover, process mining adds the “process dimension” to data and can be used to
dissect event data from a more holistic perspective. As will be shown in the remain-
der, process mining creates a solid bridge between process modeling and analysis
on the one hand and data mining on the other.