Page 72 -
P. 72

But in the era of big data, we now have access to huge training sets, such as cat internet
             images. Even if the training set comes from a different distribution than the dev/test set, we
             still want to use it for learning since it can provide a lot of information.

             For the cat detector example, instead of putting all 10,000 user-uploaded images into the
             dev/test sets, we might instead put 5,000 into the dev/test sets. We can put the remaining
             5,000 user-uploaded examples into the training set. This way, your training set of 205,000

             examples contains some data that comes from your dev/test distribution along with the
             200,000 internet images. We will discuss in a later chapter why this method is helpful.

             Let’s consider a second example. Suppose you are building a speech recognition system to
             transcribe street addresses for a voice-controlled mobile map/navigation app. You have
             20,000 examples of users speaking street addresses. But you also have 500,000 examples of

             other audio clips with users speaking about other topics. You might take 10,000 examples of
             street addresses for the dev/test sets, and use the remaining 10,000, plus the additional
             500,000 examples, for training.

             We will continue to assume that your dev data and your test data come from the same
             distribution. But it is important to understand that different training and dev/test
             distributions offer some special challenges.











































             Page 72                            Machine Learning Yearning-Draft                       Andrew Ng
   67   68   69   70   71   72   73   74   75   76   77