Page 102 -
P. 102

84                                                     3  Data Mining


















            Fig. 3.14 Confusion matrix for two classes and some performance measures for classifiers


            • fp is the number of false positives, i.e., instances that are predicted to be positive
              but should have been classified as negative.
            • tn is the number of true negatives, i.e., instances that are correctly classified as
              negative.

            Figure 3.14(a) also shows the sums of rows and columns, e.g., p = tp + fn is the
            number of instances that are actually positive, n = fn + tn is the number of in-

            stances that are classified as negative by the classifier. N = tp + fn + fp + tn is
            the total number of instances in the data set. Based on this it is easy to define the
            measures shown in Fig. 3.14(b). The error is defined as the proportion of instances
            misclassified: (fp + fn)/N.The accuracy measures the fraction of instances on the
            diagonal of the confusion matrix. The “true positive rate”, tp-rate, also known as
            “hit rate”, measures the proportion of positive instances indeed classified as posi-
            tive. The “false positive rate”, fp-rate, also known as “false alarm rate”, measures
            the proportion of negative instances wrongly classified as positive. The terms pre-

            cision and recall originate from information retrieval. Precision is defined as tp/p .
            Here, one can think of p as the number of documents that have been retrieved

            based on some search query and tp as the number of documents that have been re-
            trieved and also should have been retrieved. Recall is defined as tp/p where p can
            be interpreted as the number of documents that should have been retrieved based
            on some search query. It is possible to have high precision and low recall; few of
            the documents searched for are returned by the query, but those that are returned are
            indeed relevant. It is also possible to have high recall and low precision; many docu-
            ments are returned (including the ones relevant), but also many irrelevant documents
            are returned. Note that recall is the same as tp-rate. There is another frequently used
            metric not shown in Fig. 3.14(b): the so-called F1 score. The F1 score takes the har-
            monic mean of precision and recall: (2×precision×recall)/(precision+recall).If
            either the precision or recall is really poor (i.e., close to 0), then the F1 score is also
            close to 0. Only if both precision and recall are really good, the F1 score is close
            to 1.
              To illustrate the different metrics let us consider the three decision trees depicted
            in Fig. 3.4. In the first two decision trees, all instances are classified as young. Note
   97   98   99   100   101   102   103   104   105   106   107