Page 108 -

P. 108

90 3 Data Mining

corresponds to instances belonging to the same class. However, it is obvious that
such a decision tree is too speciﬁc and has little predictive value.
A learned model is underﬁtting if it is too general and allows for things not “sup-
ported by evidence” in the data set. Whereas overﬁtting can be characterized by a
lack of generalization, underﬁtting has the opposite problem: too much generaliza-
tion. Consider, for example, the generation of association rules. Generating many
detailed rules due to very low settings of minsup and minconf , corresponds to over-
ﬁtting. Many rules are found, but these are probably rather speciﬁc for the training
set. Generating very few rules due to very high settings of minsup and minconf , cor-
responds to underﬁtting. In the extreme case, no association rules are found. Note
that the model with no rules ﬁts any data set and, hence, carries no information.
Underﬁtting is particularly problematic if the data set contains no negative ex-
amples. Consider, for example, the confusion matrix in Fig. 3.14(a). Suppose that
we have a training set with only positive examples, i.e., n = 0 in the training set.
How to construct a decision tree without negative examples? Most algorithms will
simply classify everything as positive. This shows that classiﬁcation assumes both
positive and negative examples. This is not the case for association rule learning.
Consider, for example, the data set shown in Table 3.3. Suppose that the item-set
{latte,tea,bagel} does not appear in the data set. This implies that no customer
ordered these three items together in the training set. Can we conclude from this
that it is not possible to order these three items together? Of course not! Therefore,
association rule learning focuses on positive examples that are somehow frequent.
Nevertheless, for some applications it would be useful to be able to discover “nega-
tive rules” such as the rule that customers are not allowed to order latte’s, teas, and
bagels in a single order.
A good balance between overﬁtting and underﬁtting is of the utmost importance
for process discovery. Consider again the Petri net model shown in Fig. 1.5.The
model allows for the behavior seen in the event log. It also generalizes as it allows
for more sequences than present in the training set. In the event log there is no trace
a,h
, i.e., the scenario in which a request is registered and immediately rejected
does not appear in the log. This does not necessarily imply that this is not possible.
However, constructing a model that allows for a,h
although it is not in the log
would result in a model that is clearly underﬁtting. This dilemma is caused by the
lack of negative examples in the event log. The traces in the event log show what has
happened and not what could not happen. This problem will be addressed in later
chapters.
We conclude this chapter with Occam’s Razor, a principle attributed to the 14th-
century English logician William of Ockham. The principle states that “one should
not increase, beyond what is necessary, the number of entities required to explain
anything”, i.e., one should look for the “simplest model” that can explain what is ob-
served in the data set. This principle is related to ﬁnding a natural balance between
overﬁtting and underﬁtting. The Minimal Description Length (MDL) principle tries
to operationalize Occam’s Razor [47, 129]. According to the MDL paradigm, model
quality is no longer only based on predicting performance (e.g., F1 score), but also
on the simplicity of the model. Moreover, it does not aim at cross-validation in the

103 104 105 106 107 108 109 110 111 112 113