Page 34 -

P. 34

16 Cleaning up mislabeled dev and test set

examples

During error analysis, you might notice that some examples in your dev set are mislabeled.
When I say “mislabeled” here, I mean that the pictures were already mislabeled by a human

labeler even before the algorithm encountered it. I.e., the class label in an example (x,y) has
an incorrect value for y. For example, perhaps some pictures that are not cats are mislabeled
as containing a cat, and vice versa. If you suspect the fraction of mislabeled images is
significant, add a category to keep track of the fraction of examples mislabeled:

Image Dog Great cat Blurry Mislabeled Comments
…

98 ✔ Labeler missed cat
in background
99 ✔

100 ✔ Drawing of a cat;
not a real cat.
% of total 8% 43% 61% 6%

Should you correct the labels in your dev set? Remember that the goal of the dev set is to
help you quickly evaluate algorithms so that you can tell if Algorithm A or B is better. If the
fraction of the dev set that is mislabeled impedes your ability to make these judgments, then
it is worth spending time to fix the mislabeled dev set labels.

For example, suppose your classifier’s performance is:

• Overall accuracy on dev set.………………. 90% (10% overall error.)
• Errors due to mislabeled examples……. 0.6% (6% of dev set errors.)
• Errors due to other causes………………… 9.4% (94% of dev set errors)

Here, the 0.6% inaccuracy due to mislabeling might not be significant enough relative to the

9.4% of errors you could be improving. There is no harm in manually fixing the mislabeled
images in the dev set, but it is not crucial to do so: It might be fine not knowing whether your
system has 10% or 9.4% overall error.

Suppose you keep improving the cat classifier and reach the following performance:

Page 34 Machine Learning Yearning-Draft Andrew Ng

29 30 31 32 33 34 35 36 37 38 39