Page 34 -
P. 34

16 Cleaning up mislabeled dev and test set

             examples




             During error analysis, you might notice that some examples in your dev set are mislabeled.
             When I say “mislabeled” here, I mean that the pictures were already mislabeled by a human

             labeler even before the algorithm encountered it. I.e., the class label in an example ​(x,y)​ has
             an incorrect value for ​y​. For example, perhaps some pictures that are not cats are mislabeled
             as containing a cat, and vice versa. If you suspect the fraction of mislabeled images is
             significant, add a category to keep track of the fraction of examples mislabeled:


              Image                 Dog           Great cat       Blurry       Mislabeled        Comments
                           …

                           98                                                      ✔          Labeler missed cat
                                                                                              in background
                           99                        ✔

                         100                                                       ✔          Drawing of a cat;
                                                                                              not a real cat.
              % of total            8%              43%            61%             6%



             Should you correct the labels in your dev set? Remember that the goal of the dev set is to
             help you quickly evaluate algorithms so that you can tell if Algorithm A or B is better. If the
             fraction of the dev set that is mislabeled impedes your ability to make these judgments, then
             it is worth spending time to fix the mislabeled dev set labels.

             For example, suppose your classifier’s performance is:


             •   Overall accuracy on dev set.………………. 90% (10% overall error.)
             •   Errors due to mislabeled examples……. 0.6% (6% of dev set errors.)
             •   Errors due to other causes………………… 9.4% (94% of dev set errors)

             Here, the 0.6% inaccuracy due to mislabeling might not be significant enough relative to the

             9.4% of errors you could be improving. There is no harm in manually fixing the mislabeled
             images in the dev set, but it is not crucial to do so: It might be fine not knowing whether your
             system has 10% or 9.4% overall error.

             Suppose you keep improving the cat classifier and reach the following performance:




             Page 34                            Machine Learning Yearning-Draft                       Andrew Ng
   29   30   31   32   33   34   35   36   37   38   39