Page 15 -
P. 15

our example above) is different from the distribution you ultimately care about (mobile
             phone images).

             We usually define:

             •   Training set​ — Which you run your learning algorithm on.


             •   Dev (development) set​ — Which you use to tune parameters, select features, and
                 make other decisions regarding the learning algorithm. Sometimes also called the
                 hold-out cross validation set​.

             •   Test set​ — which you use to evaluate the performance of the algorithm, but not to make
                 any decisions regarding what learning algorithm or parameters to use.

             Once you define a dev set (development set) and test set, your team will try a lot of ideas,

             such as different learning algorithm parameters, to see what works best. The dev and test
             sets allow your team to quickly see how well your algorithm is doing.

             In other words, ​the purpose of the dev and test sets are to direct your team toward
             the most important changes to make to the machine learning system​.


             So, you should do the following:

                        Choose dev and test sets to reflect data you expect to get in the future
                        and want to do well on.

             In other words, your test set should not simply be 30% of the available data, especially if you
             expect your future data (mobile phone images) to be different in nature from your training
             set (website images).

             If you have not yet launched your mobile app, you might not have any users yet, and thus
             might not be able to get data that accurately reflects what you have to do well on in the

             future. But you might still try to approximate this. For example, ask your friends to take
             mobile phone pictures of cats and send them to you. Once your app is launched, you can
             update your dev/test sets using actual user data.

             If you really don’t have any way of getting data that approximates what you expect to get in
             the future, perhaps you can start by using website images. But you should be aware of the

             risk of this leading to a system that doesn’t generalize well.

             It requires judgment to decide how much to invest in developing great dev and test sets. But
             don’t assume your training distribution is the same as your test distribution. Try to pick test






             Page 15                            Machine Learning Yearning-Draft                       Andrew Ng
   10   11   12   13   14   15   16   17   18   19   20