Page 14 -

P. 14

5 Your development and test sets

Let’s return to our earlier cat pictures example: You run a mobile app, and users are
uploading pictures of many different things to your app. You want to automatically find the
cat pictures.

Your team gets a large training set by downloading pictures of cats (positive examples) and

non-cats (negative examples) off of different websites. They split the dataset 70%/30% into
training and test sets. Using this data, they build a cat detector that works well on the
training and test sets.

But when you deploy this classifier into the mobile app, you find that the performance is
really poor!

What happened?

You figure out that the pictures users are uploading have a different look than the website

images that make up your training set: Users are uploading pictures taken with mobile
phones, which tend to be lower resolution, blurrier, and poorly lit. Since your training/test
sets were made of website images, your algorithm did not generalize well to the actual
distribution you care about: mobile phone pictures.

Before the modern era of big data, it was a common rule in machine learning to use a
random 70%/30% split to form your training and test sets. This practice can work, but it’s a

bad idea in more and more applications where the training distribution (website images in

Page 14 Machine Learning Yearning-Draft Andrew Ng

9 10 11 12 13 14 15 16 17 18 19