Page 64 -
P. 64

negative. More generally, you can make sure the fraction of examples from each class is as
               close as possible to the overall fraction in the original training set.

             I would not bother with either of these techniques unless you have already tried plotting
             learning curves and concluded that the curves are too noisy to see the underlying trends. If
             your training set is large—say over 10,000 examples—and your class distribution is not very
             skewed, you probably won’t need these techniques.


             Finally, plotting a learning curve may be computationally expensive: For example, you might
             have to train ten models with 1,000, then 2,000, all the way up to 10,000 examples. Training
             models with small datasets is much faster than training models with large datasets. Thus,
             instead of evenly spacing out the training set sizes on a linear scale as above, you might train
             models with 1,000, 2,000, 4,000, 6,000, and 10,000 examples. This should still give you a

             clear sense of the trends in the learning curves. Of course, this technique is relevant only if
             the computational cost of training all the additional models is significant.














































             Page 64                            Machine Learning Yearning-Draft                       Andrew Ng
   59   60   61   62   63   64   65   66   67   68   69