Page 101 -
P. 101

3.6 Quality of Resulting Models                                 83

            Fig. 3.13 Confusion matrix
            for the decision tree shown in
            Fig. 3.2. Of the 200 students
            who failed, 178 are classified
            as failed and 22 are classified
            as passed. None of the failing
            students was classified as cum
            laude. Of the 198 students
            who passed, 175 are classified
            correctly, 21 were classified
            as failed, and 2 as cum laude





            we concentrate on k-fold cross-validation. Finally, we conclude with a more general
            discussion on Occam’s razor.



            3.6.1 Measuring the Performance of a Classifier

            In Sect. 3.2, we showed how to construct a decision tree. As discussed, there are
            many design decisions when developing a decision tree learner (e.g., selection of
            attributes to split on, when to stop splitting, and determining cut values). The ques-
            tion is how to evaluate the performance of a decision tree learner. This is relevant
            for judging the trustworthiness of the resulting decision tree and for comparing dif-
            ferent approaches. A complication is that one can only judge the performance based
            on seen instances although the goal is also to predict good classifications for unseen
            instances. However, for simplicity, let us first assume that we first want to judge the
            result of a classifier (like a decision tree) on a given data set.
              Given a data set consisting of N instances we know for each instance what the ac-
            tual class is and what the predicted class is. For example, for a particular person that
            smokes, we may predict that the person will die young (predicted class is “young”),
            even though the person dies at age 104 (actual class is “old”). This can be visualized
            using a so-called confusion matrix. Figure 3.13 shows the confusion matrix for the
            data set shown in Table 3.2 and the decision tree shown in Fig. 3.2. The decision
            tree classifies each of the 420 students into an actual class and a predicted class. All
            elements on the diagonal are predicted correctly, i.e., 178 + 175 + 18 = 371 of the
            420 students are classified correctly (approximately 88%).
              There are several performance measures based on the confusion matrix. To define
            these let us consider a data set with only two classes: “positive” (+) and “negative”
            (−). Figure 3.14(a) shows the corresponding 2 × 2 confusion matrix. The following
            entries are shown:
            • tp is the number of true positives, i.e., instances that are correctly classified as
              positive.
            • fn is the number of false negatives, i.e., instances that are predicted to be negative
              but should have been classified as positive.
   96   97   98   99   100   101   102   103   104   105   106