Page 102 -
P. 102
84 3 Data Mining
Fig. 3.14 Confusion matrix for two classes and some performance measures for classifiers
• fp is the number of false positives, i.e., instances that are predicted to be positive
but should have been classified as negative.
• tn is the number of true negatives, i.e., instances that are correctly classified as
negative.
Figure 3.14(a) also shows the sums of rows and columns, e.g., p = tp + fn is the
number of instances that are actually positive, n = fn + tn is the number of in-
stances that are classified as negative by the classifier. N = tp + fn + fp + tn is
the total number of instances in the data set. Based on this it is easy to define the
measures shown in Fig. 3.14(b). The error is defined as the proportion of instances
misclassified: (fp + fn)/N.The accuracy measures the fraction of instances on the
diagonal of the confusion matrix. The “true positive rate”, tp-rate, also known as
“hit rate”, measures the proportion of positive instances indeed classified as posi-
tive. The “false positive rate”, fp-rate, also known as “false alarm rate”, measures
the proportion of negative instances wrongly classified as positive. The terms pre-
cision and recall originate from information retrieval. Precision is defined as tp/p .
Here, one can think of p as the number of documents that have been retrieved
based on some search query and tp as the number of documents that have been re-
trieved and also should have been retrieved. Recall is defined as tp/p where p can
be interpreted as the number of documents that should have been retrieved based
on some search query. It is possible to have high precision and low recall; few of
the documents searched for are returned by the query, but those that are returned are
indeed relevant. It is also possible to have high recall and low precision; many docu-
ments are returned (including the ones relevant), but also many irrelevant documents
are returned. Note that recall is the same as tp-rate. There is another frequently used
metric not shown in Fig. 3.14(b): the so-called F1 score. The F1 score takes the har-
monic mean of precision and recall: (2×precision×recall)/(precision+recall).If
either the precision or recall is really poor (i.e., close to 0), then the F1 score is also
close to 0. Only if both precision and recall are really good, the F1 score is close
to 1.
To illustrate the different metrics let us consider the three decision trees depicted
in Fig. 3.4. In the first two decision trees, all instances are classified as young. Note