Page 76 -
P. 76
39 Weighting data
Suppose you have 200,000 images from the internet and 5,000 images from your mobile
app users. There is a 40:1 ratio between the size of these datasets. In theory, so long as you
build a huge neural network and train it long enough on all 205,000 images, there is no
harm in trying to make the algorithm do well on both internet images and mobile images.
But in practice, having 40x as many internet images as mobile app images might mean you
need to spend 40x (or more) as much computational resources to model both, compared to if
you trained on only the 5,000 images.
If you don’t have huge computational resources, you could give the internet images a much
lower weight as a compromise.
For example, suppose your optimization objective is squared error (This is not a good choice
for a classification task, but it will simplify our explanation.) Thus, our learning algorithm
tries to optimize:
The first sum above is over the 5,000 mobile images, and the second sum is over the
200,000 internet images. You can instead optimize with an additional parameter :
If you set =1/40, the algorithm would give equal weight to the 5,000 mobile images and the
200,000 internet images. You can also set the parameter to other values, perhaps by
tuning to the dev set.
By weighting the additional Internet images less, you don’t have to build as massive a neural
network to make sure the algorithm does well on both types of tasks. This type of
re-weighting is needed only when you suspect the additional data (Internet Images) has a
very different distribution than the dev/test set, or if the additional data is much larger than
the data that came from the same distribution as the dev/test set (mobile images).
Page 76 Machine Learning Yearning-Draft Andrew Ng