Page 78 -
P. 78

•   Training set. This is the data that the algorithm will learn from (e.g., Internet images +
                 Mobile images). This does not have to be drawn from the same distribution as what we
                 really care about (the dev/test set distribution).

             •   Training dev set: This data is drawn from the same distribution as the training set (e.g.,
                 Internet images + Mobile images). This is usually smaller than the training set; it only
                 needs to be large enough to evaluate and track the progress of our learning algorithm.


             •   Dev set: This is drawn from the same distribution as the test set, and it reflects the
                 distribution of data that we ultimately care about doing well on. (E.g., mobile images.)

             •   Test set: This is drawn from the same distribution as the dev set. (E.g., mobile images.)

             Armed with these four separate datasets, you can now evaluate:


             •   Training error, by evaluating on the training set.

             •   The algorithm’s ability to generalize to new data drawn from the training set distribution,
                 by evaluating on the training dev set.


             •   The algorithm’s performance on the task you care about, by evaluating on the dev and/or
                 test sets.

             Most of the guidelines in Chapters 5-7 for picking the size of the dev set also apply to the
             training dev set.































             Page 78                            Machine Learning Yearning-Draft                       Andrew Ng
   73   74   75   76   77   78   79   80   81   82   83