Page 82 -
P. 82

43 Artificial data synthesis



             Your speech system needs more data that sounds as if it were taken from within a car. Rather
             than collecting a lot of data while driving around, there might be an easier way to get this
             data: By artificially synthesizing it.

             Suppose you obtain a large quantity of car/road noise audio clips. You can download this

             data from several websites. Suppose you also have a large training set of people speaking in a
             quiet room. If you take an audio clip of a person speaking and “add” to that to an audio clip
             of car/road noise, you will obtain an audio clip that sounds as if that person was speaking in
             a noisy car. Using this process, you can “synthesize” huge amounts of data that sound as if it
             were collected inside a car.

             More generally, there are several circumstances where artificial data synthesis allows you to

             create a huge dataset that reasonably matches the dev set. Let’s use the cat image detector as
             a second example. You notice that dev set images have much more motion blur because they
             tend to come from cellphone users who are moving their phone slightly while taking the
             picture. You can take non-blurry images from the training set of internet images, and add
             simulated motion blur to them, thus making them more similar to the dev set.

             Keep in mind that artificial data synthesis has its challenges: it is sometimes easier to create

             synthetic data that appears realistic to a person than it is to create data that appears realistic
             to a computer. For example, suppose you have 1,000 hours of speech training data, but only
             1 hour of car noise. If you repeatedly use the same 1 hour of car noise with different portions
             from the original 1,000 hours of training data, you will end up with a synthetic dataset where
             the same car noise is repeated over and over. While a person listening to this audio probably
             would not be able to tell—all car noise sounds the same to most of us—it is possible that a
             learning algorithm would “overfit” to the 1 hour of car noise. Thus, it could generalize poorly

             to a new audio clip where the car noise happens to sound different.

             Alternatively, suppose you have 1,000 unique hours of car noise, but all of it was taken from
             just 10 different cars. In this case, it is possible for an algorithm to “overfit” to these 10 cars
             and perform poorly if tested on audio from a different car. Unfortunately, these problems
             can be hard to spot.










             Page 82                            Machine Learning Yearning-Draft                       Andrew Ng
   77   78   79   80   81   82   83   84   85   86   87