Page 36 -
P. 36

17 If you have a large dev set, split it into two

             subsets, only one of which you look at




             Suppose you have a large dev set of 5,000 examples in which you have a 20% error rate.
             Thus, your algorithm is misclassifying ~1,000 dev images. It takes a long time to manually

             examine 1,000 images, so we might decide not to use all of them in the error analysis.

             In this case, I would explicitly split the dev set into two subsets, one of which you look at, and
             one of which you don’t. You will more rapidly overfit the portion that you are manually
             looking at. You can use the portion you are not manually looking at to tune parameters.


















             Let​’​s continue our example above, in which the algorithm is misclassifying 1,000 out of
             5,000 dev set examples. Suppose we want to manually examine about 100 errors for error
             analysis (10% of the errors). You should randomly select 10% of the dev set and place that
             into what we’ll call an ​Eyeball dev set​ to remind ourselves that we are looking at it with our
             eyes. (For a project on speech recognition, in which you would be listening to audio clips,
             perhaps you would call this set an Ear dev set instead). The Eyeball dev set therefore has 500
             examples, of which we would expect our algorithm to misclassify about 100.


             The second subset of the dev set, called the ​Blackbox dev set​, will have the remaining
             4500 examples. You can use the Blackbox dev set to evaluate classifiers automatically by
             measuring their error rates. You can also use it to select among algorithms or tune
             hyperparameters. However, you should avoid looking at it with your eyes. We use the term
             “Blackbox” because we will only use this subset of the data to obtain “Blackbox” evaluations
             of classifiers.













             Page 36                            Machine Learning Yearning-Draft                       Andrew Ng
   31   32   33   34   35   36   37   38   39   40   41