Page 38 -
P. 38

18 How big should the Eyeball and Blackbox

             dev sets be?


















             Your Eyeball dev set should be large enough to give you a sense of your algorithm’s major
             error categories. If you are working on a task that humans do well (such as recognizing cats
             in images), here are some rough guidelines:

             • An eyeball dev set in which your classifier makes 10 mistakes would be considered very
               small. With just 10 errors, it’s hard to accurately estimate the impact of different error

               categories. But if you have very little data and cannot afford to put more into the Eyeball
               dev set, it​’​s better than nothing and will help with project prioritization.

             • If your classifier makes ~20 mistakes on eyeball dev examples, you would start to get a
               rough sense of the major error sources.


             • With ~50 mistakes, you would get a good sense of the major error sources.

             • With ~100 mistakes, you would get a very good sense of the major sources of errors. I’ve
               seen people manually analyze even more errors—sometimes as many as 500. There is no
               harm in this as long as you have enough data.

             Say your classifier has a 5% error rate. To make sure you have ~100 mislabeled examples in

             the Eyeball dev set, the Eyeball dev set would have to have about 2,000 examples (since
             0.05*2,000 = 100). The lower your classifier’s error rate, the larger your Eyeball dev set
             needs to be in order to get a large enough set of errors to analyze.

             If you are working on a task that even humans cannot do well, then the exercise of examining
             an Eyeball dev set will not be as helpful because it is harder to figure out why the algorithm
             didn’t classify an example correctly. In this case, you might omit having an Eyeball dev set.

             We discuss guidelines for such problems in a later chapter.








             Page 38                            Machine Learning Yearning-Draft                       Andrew Ng
   33   34   35   36   37   38   39   40   41   42   43