Page 14 -
P. 14

5 Your development and test sets




             Let’s return to our earlier cat pictures example: You run a mobile app, and users are
             uploading pictures of many different things to your app. You want to automatically find the
             cat pictures.

             Your team gets a large training set by downloading pictures of cats (positive examples) and

             non-cats (negative examples) off of different websites. They split the dataset 70%/30% into
             training and test sets. Using this data, they build a cat detector that works well on the
             training and test sets.

             But when you deploy this classifier into the mobile app, you find that the performance is
             really poor!



























             What happened?

             You figure out that the pictures users are uploading have a different look than the website

             images that make up your training set: Users are uploading pictures taken with mobile
             phones, which tend to be lower resolution, blurrier, and poorly lit. Since your training/test
             sets were made of website images, your algorithm did not generalize well to the actual
             distribution you care about: mobile phone pictures.

             Before the modern era of big data, it was a common rule in machine learning to use a
             random 70%/30% split to form your training and test sets. This practice can work, but it’s a

             bad idea in more and more applications where the training distribution (website images in





             Page 14                            Machine Learning Yearning-Draft                       Andrew Ng
   9   10   11   12   13   14   15   16   17   18   19