Page 102 -

P. 102

84 3 Data Mining

Fig. 3.14 Confusion matrix for two classes and some performance measures for classiﬁers

• fp is the number of false positives, i.e., instances that are predicted to be positive
but should have been classiﬁed as negative.
• tn is the number of true negatives, i.e., instances that are correctly classiﬁed as
negative.

Figure 3.14(a) also shows the sums of rows and columns, e.g., p = tp + fn is the
number of instances that are actually positive, n = fn + tn is the number of in-

stances that are classiﬁed as negative by the classiﬁer. N = tp + fn + fp + tn is
the total number of instances in the data set. Based on this it is easy to deﬁne the
measures shown in Fig. 3.14(b). The error is deﬁned as the proportion of instances
misclassiﬁed: (fp + fn)/N.The accuracy measures the fraction of instances on the
diagonal of the confusion matrix. The “true positive rate”, tp-rate, also known as
“hit rate”, measures the proportion of positive instances indeed classiﬁed as posi-
tive. The “false positive rate”, fp-rate, also known as “false alarm rate”, measures
the proportion of negative instances wrongly classiﬁed as positive. The terms pre-

cision and recall originate from information retrieval. Precision is deﬁned as tp/p .
Here, one can think of p as the number of documents that have been retrieved

based on some search query and tp as the number of documents that have been re-
trieved and also should have been retrieved. Recall is deﬁned as tp/p where p can
be interpreted as the number of documents that should have been retrieved based
on some search query. It is possible to have high precision and low recall; few of
the documents searched for are returned by the query, but those that are returned are
indeed relevant. It is also possible to have high recall and low precision; many docu-
ments are returned (including the ones relevant), but also many irrelevant documents
are returned. Note that recall is the same as tp-rate. There is another frequently used
metric not shown in Fig. 3.14(b): the so-called F1 score. The F1 score takes the har-
monic mean of precision and recall: (2×precision×recall)/(precision+recall).If
either the precision or recall is really poor (i.e., close to 0), then the F1 score is also
close to 0. Only if both precision and recall are really good, the F1 score is close
to 1.
To illustrate the different metrics let us consider the three decision trees depicted
in Fig. 3.4. In the ﬁrst two decision trees, all instances are classiﬁed as young. Note

97 98 99 100 101 102 103 104 105 106 107